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Preface 



No matter what your perspective is, what your goals are, or how experienced you 
are, Artificial Life research is always a learning experience. The variety of phe- 
nomena that the people who gathered in Lausanne reported and discussed for the 
fifth time since 1991 at the European Conference on Artificial Life (ECAL) has 
not been programmed, crafted, or assembled by analytic design. It has evolved, 
emerged, or appeared spontaneously from a process of artificial evolution, self- 
organisation, or development. 

Artificial Life is a field where biological and artificial sciences meet and blend 
together, where the dynamics of biological life are reproduced in the memory of 
computers, where machines evolve, behave, and communicate like living organ- 
isms, where complex life-like entities are synthesised from electronic chromo- 
somes and artificial chemistries. The impact of Artificial Life in science, phi- 
losophy, and technology is tremendous. Over the years the synthetic approach 
has established itself as a powerful method for investigating several complex 
phenomena of life. From a philosophical standpoint, the notion of life and of in- 
telligence is continuously reformulated in relation to the dynamics of the system 
under observation and to the embedding environment, no longer a privilege of 
carbon-based entities with brains and eyes. At the same time, the possibility of 
engineering machines and software with life-like properties such as evolvability, 
self-repair, and self-maintainance is gradually becoming reality, bringing new 
perspectives in engineering and applications. 

All these aspects, and many more, are reflected in the 90 papers presented 
at ECAL’99 from 13 to 17 September 1999 and collected in this volume. Each 
paper has been carefully reviewed by three members of the scientific committee 
(see list following the Preface) and selected from among 150 submissions. Of the 
selected and revised papers, 50 have been accepted as long oral contributions 
and the remaining 40 as short poster contributions. In both cases, the overriding 
selection criteria have been scientific and methodological soundness, novelty, 
and potential for future developments. In addition to the contributed papers, 
this volume includes the abstracts of four keynote lectures (H. Meinhardt, W. 
D. Hamilton, L. Steels, and T. Lenton) and two invited talks (D. Mange and D. 
Thalmann). At the end of each abstract, the reader will find a list of the most 
relevant references for these talks. 

In addition to single-track presentations, demonstrations, and satellite work- 
shops, the first day of the conference was dedicated to a series of tutorials cover- 
ing genotype-phenotype mappings, collective intelligence, cellular automata and 
complex systems, synthetic actors, evolutionary robotics, and artificial chem- 
istry. 

Contributed and invited papers have been classified according to the following 
broad categories. 
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Epistemology is concerned with the philosophical aspects of Artificial Life. 
The two selected papers address two key concepts in Artificial Life; what an 
emergent phenomenon could be and when an entity may be defined as alive. 

Evolutionary Dynamics (17 papers) addresses a number of fundamental is- 
sues in natural and artificial evolution. These include the interactions between 
evolution and other forms of ontogenetic dynamics, the role and effect of muta- 
tions, development of synthetic organisms, and measures of diversity and com- 
plexity in evolving systems. 

Evolutionary Cybernetics (12 papers) is artificial evolution of mechanisms 
and structures that support behavior of biological and artificial organisms with a 
sensory motor system. The papers included in this section employ this methodol- 
ogy for both understanding the development and functioning of biological brains 
and for synthesising control systems of autonomous robots and of other artificial 
creatures. 

Bio-Inspired Robotics and Autonomous Agents (15 papers) is a collection of 
contributions describing recent efforts in building physical and virtual agents 
with life-like properties, such as bio-inspired control, adaptation, human and 
animal morphologies, and behavioural autonomy. Some of these papers go as far 
as addressing motivation, emotions, and economic behaviour. 

Self-Replication, Self- Maintenance, and Gene Expression (16 papers) includes 
papers that investigate some fundamental properties of micro-entities such as 
RNA, DNA, cells, and cellular aggregates. These micro-entities are capable of 
self-replication, self-maintenance, evolution, and development into full organ- 
isms from a set of genetic instructions. While some of the authors attempt to 
understand these principles with mathematical models and computer simula- 
tions, others incorporate them into a new generation of bio-inspired electronic 
circuits capable of complex behaviours. 

Societies and Collective Behaviours (17 papers) display complex dynamics 
that cannot be understood by looking at single individuals in isolation. Sex, coop- 
eration, selfishness, teaching, cultural transmission, distributed problem solving, 
or simple interference are some of the phenomena that one observes in assem- 
blies of natural and artificial organisms. The papers in this section attempt to 
understand under which conditions these phenomena arise, when they develop, 
and how one could exploit them to create societies of artificial agents and robots 
capable of performing complex tasks. 

Communication and Language (13 papers) goes one step further and inves- 
tigates the emergence and role of communication in societies of organisms and 
intelligent machines. In most cases communication is considered a dynamic phe- 
nomenon arising in populations of organisms that evolve, learn, and dynamically 
form temporal aggregations. Within this conceptual framework, some papers ad- 
dress the origin of language in its many manifestations, ranging from speech to 
lexicon and syntax. 

ECAL’99 was selected as the 1999 International EPFL-Latsis Foundation 
Conference. Generous sponsorship by the Latsis Foundation allowed us to in- 
vite high-quality keynote lecturers, offer several student fellowships, and make 
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sure that the necessary resources were available for a successful organisation. 
Additional sponsors are listed on the next page. 

The organisers thank very much Monique Dubois and Joseba Urzelai for their 
assistance. Monique ensured a smooth and professional organisation from day 
1, taking care of every single detail with precision and patience, while Joseba 
managed all bits and tricks of electronic submissions and e-mails. Marie-Jo Pel- 
laud kindly helped out whenever it was necessary. Our thanks also go to Luigi 
Pagliarini for the cover art and to Mark Peden for the caterpillar depicted in it. 

Lausanne, June 1999 Dario Floreano 

Jean-Daniel Nicoud 
Francesco Mondada 
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From Fertilized Eggs to Complex Organisms: 
Models of Biological Pattern Formation 



Hans Meinhardt 

Max Planck-Institut fuer Entwicklungsbiologie 
D-72076 Tuebingen, Germany 
bans . jneinhardOtueb ingen . mpg . de 



The development of an organism from a single cell to its full spatial complexity is 
a most fascinating phenomenon. Up to about 1984, most information have been 
observed from perturbation of normal development and the observation of the 
subsequent regulations. Based on this observations, we have elaborated molecu- 
larly reasonable molecular interactions that reproduce in computer simulations 
the observed dynamic regulation in great detail. The proposed interactions have 
found meanwhile direct support by molecular-genetic investigations. The follow- 
ing processes are proposed to play a crucial role: 

(i) Primary pattern formation is accomplished by short range autocatalysis 
and long ranging inhibition. This leads to local concentration maxima that can 
act as organizing regions. Either monotonically graded, spatial periodic or stripe- 
like distributions are possible, (ii) Cells respond to such signals by obtaining a 
stable state of differentiation. This is achieved by a direct or indirect autoreg- 
ulation of genes. If the signal is above a threshold, the corresponding feedback 
loop is turned on. Due to the autoregulation, the maintenance of this activity 
does no longer require the inducing signal. A cell remembers the signals it has 
received in its history. Many autoregulatory genes have been found meanwhile, 
(iii) Segmentation, a basic pattern in all higher organisms, depends on mutual 
long range stabilization of cell states that locally exclude each other. Cell types 
that form neighbouring structures need each other in a symbiotic manner. This 
leads to a controlled neighbourhood of structures, (iv) Legs or wings are initiated 
at boundaries between regions of different determinations. If the production of a 
new morphogen requires a co-operation of two differently determined celt types, 
their common border will become a new signalling centre. The local morphogen 
concentration provides a measure for the distance from this border. The inter- 
section of two such borders defines unique points and complete new co-ordinate 
systems for the initiation of these structures. The model accounts for the correct 
initiation of legs and wings at particular positions of the organism and for their 
correct orientation in respect to the main body axes of the embryo. This model 
has found direct support from observations in vertebrates and insects. 

Patterns on shells of tropical molluscs are an especially instructive exam- 
ple for pattern formation in space and time since they preserve the complete 
history of their generation. Mechanisms derived from this patterning provide a 
better understanding of processes that seems completely unrelated, such as the 
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guiding of a growing nerve or blood coagulation. This suggest that similar reac- 
tions are used in a toolbox-like fashion again and again in different situations of 
development. 

References 

1. H. Meinhardt. Biological Pattern- Formation - New Observations Provide Support 
for Theoretical Predictions. Bioessays, 16:627-632, 1994. 

2. H. Meinhardt. The Algorithmic Beauty of Sea Shells. Springer Verlag, Heidelberg, 
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Fables of Cyberspace: Tapeworms, Horses, and 

Mountains 



William D. Hamilton 



Department of Zoology 
University of Oxford 
0X1 3PS Oxford, UK 
william . hamiltonflzoology . oxf ord .ac.uk 



If microevolution is hill-climbing, then macroevolution is valley-crossing leading 
to higher hills. Does sexuality, making reproduction a reticulate process instead 
of bifurcating, help macroevolution? Even without valleys, if a niche AB exists 
but cannot be accessed until either niche A or niche B is conquered, it may appear 
that sexuals should reach AB sooner because discoveries made by selection in A 
and B separately can be brought together by recombination: in contrast under 
asexuality the route to AB has to be either through A or through B, not both 
at once, and should take more time. Moreover, if valleys on the adaptive surface 
exist, the force of this presumption is increased. Depending on size of population 
and depth of valleys, a higher hill may not be accessible by adaptation at all; 
but with sexuality present at least the chance of attaining the high hill’s slope 
seems inevitably greater. All this, however, leaves out many potentially relevant 
factors including the intrinsic inefficiency of sexual reproduction. The last makes 
sex a strongly inferior competitor to asex in the short term as well as making it 
(advantageously?) slower to climb slopes once they are found. 

In the living world is sexuality maintained and what is the evidence that it 
can help macroevolution? For the silicon world, how and when should AI or GA 
specialist think to impose sexuality and what recombination values and spatial 
genomic arrangements should they choose? For the present I can only make sug- 
gestions based on what I know of the history and achievements of real life and 
from very few tentative computer simulations by myself and others. Mainly I 
will explain a theme that living systems inevitably evolve /acquire parasites and 
will describe how these, while always benefiting themselves to the detriment of 
host reproduction in the short term, in the long term, if by specialism and good 
chance not causing extermination, may actually aid hosts. In particular I will 
explain how infecting and coevolving parasites induce a permanent coevolution- 
ary dynamic as well as often a demographic one. This can maintain sexuality 
and thereby, as simulation evidence shows, can aid valley-crossing. It does so by 
(i) allowing partial adaptive discoveries made in different places to be combined, 
and (ii) keeping adaptive topographies in constant change-in effect empowering 
a Sewall Wright Shifting Balance but more strongly and determinately than his 
genetic drift. 




6 



Sexuality is not the only way real life became reticulate, but, except in pro- 
caryotes, the other processes, which involve temporary or permanent adoption 
of elsewhere-evolved tricks or whole cooperations of whole organisms, are much 
more occasional. Examples here are the horizontal transfer of plasmids and virus- 
borne (or otherwise vectored) bits of code, and the adoption of the mitochon- 
drion or/and the plastid by eucaryote ancestors. Smallness of adventive units 
(probably parasites initially) and their final complete enslavement and depen- 
dence induced by the larger organism seem the important and rare features 
here: when occasionally the features are attained, however, the consequences for 
macroevolution can be vast. 

As to higher levels of organismic interaction involving separate and un- 
enslaved organisms, as for example with claims for evolution of cooperative 
ecosystems or of ’gaias’ as wholes, it is not yet clear to what extent natural 
selection can work effectively. In the ecosystem case ’reticulation’ is extreme 
but ’reproduction’ is correspondingly ill-defined; nevertheless suggestive mixed- 
species ’superorganisrns’ not involving permanent enclosure do seem sometimes 
to arrive (e.g. lichens, corals, bogs). In the case of suggested ’galas’, any sense of 
reproduction has to be still less traditional; nevertheless viewing selection sim- 
ply as the addition of compatible species types into the gaia and as occurring 
on a time scale not of generations but the additions themselves, some weak and 
non-Darwinian ’evolution’ may seemingly still occur. What effectiveness this has 
(or what use it can be in the in silico world), however, is not yet apparent. 
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Throughout this summer, we are conducting a large-scale experiment in which an 
open-ended population of situated embodied agents bootstrap an ontology and 
a lexicon from scratch. The agents can teleport between different robot bodies 
through which they can experience and act upon the world. The agents play a 
language game, called the guessing game, in which one agent (the hearer) tries 
to identify through verbal and non-verbal hints from another agent (the speaker) 
a particular object in the environment. Humans can interact with the artificial 
agents through the Internet. They can construct their own agent, influence its 
language, and follow the performance of the agent as it travels between different 
physical locations. 

The experiment relies heavily on key concepts from artificial life. The lexicon 
self-organises based on a positive feedback loop between use and success, similar 
to the way a path in an ant society forms. The ontology develops through a se- 
lectionist process in each agent; distinctions grow in a partly random fashion and 
get pruned if they are not useful and effective in conceptualisation or communi- 
cation. The agents are grounded through a behavior-based robotic architecture 
in their environments. 

The talk will focus on the collective semiotic dynamics that we effectively 
are observing during this experiment. We expect to see damping of synonymy 
and polysemy as well as constant lexical and ontological evolution as the agents 
cope with new situations in the environment or as new agents or humans inject 
new language forms in the language games. 
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The Gaia hypothesis postulates that the Earth is a self-regulating system, that 
tends to maintain conditions habitable for life. The hypothesis grew from the 
recognition that organisms have fundamentally altered their planetary environ- 
ment - its thermodynamic state, the composition of the atmosphere, the climate, 
ocean chemistry and the land surface. Changes in the environment have in turn 
altered organisms’ growth and the forces of natural selection acting upon them 
and this implies inevitable feedback connections between life and its environ- 
ment. The resulting world seems remarkably well suited to carbon-based life, 
and has some important stabilising properties. For example, the Earth’s climate 
has remained habitable despite a 25% increase in the luminosity of the sun since 
the origin of life. Known feedback mechanisms more often show a tendency to 
regulate the environment than to destabilise it. Thus there is arguably an overall 
tendency for the Earth system to counteract gradual forcing and rapid perturba- 
tions that tend to drive the environment away from a habitable state. However, 
it is not obvious why this should necessarily be the case. If the Earth did not have 
some regulatory properties, we would probably not be here to remark on them, 
because life would have perished before evolution produced conscious observers. 
Furthermore, Gaia poses a puzzle when viewed from a Darwinian perspective: 
Why should the organisms that leave the most descendants be the ones that con- 
tribute to regulating their planetary environment? These questions highlight the 
need to test a ’general’ Gaia hypothesis; that once life has originated on a planet 
it will contribute to its own persistence. The ’specific’ Gaia hypothesis (that 
which applies specifically to the Earth) is notoriously difficult to test directly, 
because it involves long time-scales and large space scales. However, the general 
hypothesis may be tested in some sense with artificial life on artificial worlds. The 
Daisyworld model first showed that planetary self-regulation can occur without 
teleology (foresight or planning on the part of unconscious organisms). However, 
the model is a special case in that the organisms alter their environment in the 
same way at the individual and the global levels and there is no evolution. A 
series of models will be presented which adapt the simple Daisyworld framework 
to explore different special cases, and then introduce mutation of the organ- 
isms’ environment-altering properties. It has been suggested that adaptation to 
prevailing environmental conditions tends to undermine any environmental reg- 
ulation, and the evolutionary trade-off between altering one’s environment and 
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adapting to prevailing conditions will be considered. However, a more general 
model is still required to test whether, when organisms evolve new ways of alter- 
ing their environment, there is any tendency towards self-regulation of the whole 
system of organisms plus environment. Thus, artificial life researchers may be 
well placed to help test the Gaia theory. 
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Abstract. The field of artificial life (Alife) is replete with documented 
instances of emergence, though debate still persists as to the meaning 
of this term. In the absence of a formal definition, researchers in the 
field would be well served by adopting an emergence certification mark 
which would garner approval from the Alife community. We propose 
an emergence test, consisting of three criteria — design, observation, and 
surprise — for conferring the emergence label. 

1 Introduction 

When a bank’s accounting program goes seemingly independent and does its own 
thing, the programmer scratches his head, sighs, and prepares for doing overtime 
with the debugger. But when a society of agents does something surprising, Alife 
researchers may solemnly document this “emergent behavior,” and move on to 
other issues without always seeking to determine the cause of their observations. 
Indeed, overly facile use of the term emergence has made it controversial. Arkin 
recently observed that: 

Emergence is often invoked in an almost mystical sense regarding the ca- 
pabilities of behavior-based systems. Emergent behavior implies a holis- 
tic capability where the sum is considerably greater than its parts. It 
is true that what occurs in a behavior-based system is often a surprise 
to the system’s designer, but does the surprise come because of a short- 
coming of the analysis of the constituent behavioral building blocks and 
their coordination, or because of something else? [l](page 105) 

Altogether, it seems the emergence tag has become a great attention grabber, 
thanks to the striking behaviors demonstrated in artificial-life experiments. We 
do not think, however, that emergence should be diagnosed ipso facto whenever 
the unexpected intrudes into the visual field of the experimenter; nor should 
the diagnosis of emergence immediately justify an economy of explanation. Such 
abuse and overuse of the term will eventually devalue its significance, and bring 
work centered on emergence into disrepute. Therefore, we contend that in the 
absence of an acceptable definition, researchers in the field would be well served 
by adopting an emergence certification mark which would garner approval from 
the Alife community. 
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Motivated by this wish to standardize the tagging task, we propose an emer- 
gence test, namely, criteria by which one can justify conferring the emergence 
label. Our criteria are motivated by an examination of published work in the 
field of Alife [10]. 

The emergence test is presented in the next section, followed in Section 3 by 
four case studies demonstrating its applicability. Finally, in Section 4, we discuss 
a number of issues pertaining to our test. 



2 An operant definition of emergence for Alife researchers 

The difficulties we face in adopting a definition of the concept of emergence 
are reminiscent of the complications faced by early Artificial Intelligence (AI) 
researchers in defining intelligence.* Nonetheless, where the equally elusive con- 
cept of intelligence is concerned, Alan Turing found a way to cut the Gordian 
knot, by means of an operant definition which is useful within the limited context 
of man-machine interaction [14|. Debate concerning the concept of intelligence is 
unlikely to subside in the foreseeable future, and the same, we believe, holds for 
emergence. We deem, however, that viewing the world through Turing-colored 
glasses might improve our vision as regards the concept of emergence — at least 
where modern-day Alife practice is concerned. 

Alife is a constructive endeavor: some researchers aim at evolving patterns in 
a computer, some seek to elicit social behaviors in real-world robots, others wish 
to study life-related phenomena in a more controllable setting, while still others 
are interested in the synthesis of novel life-like systems in chemical, electronic, 
mechanical, and other artificial media. Alife is an experimental discipline, fun- 
damentally consisting of the observation of run-time behaviors, those complex 
interactions generated when populations of man-made, artificial creatures are 
immersed in real or simulated environments. Published work in the field usually 
relates the conception of a model, its instantiation into real-world or simulated 
objects, and the observed behavior of these objects in a collection of experiments. 

The Turing Test focuses on a human experimenter’s incapacity at discerning 
human from machine when holding what we would now call an Internet chat 
session. Our emergence test centers on an observer’s avowed incapacity (amaze- 
ment) to reconcile his perception of an experiment in terms of a global world 
view with his awareness of the atomic nature of the elementary interactions. 



* On the difficulties in defining emergence, Emmeche, Kpppe, and Stjernfelt recently 
remarked: “One reason for the widespread scepticism against the word [emergence] 
is a historical load of confusion surrounding the metaphysical aspects of the concept, 
reflected in the fact that it has been used in a long series of different ways, apparently 
making it impossible to use it as a clearly defined term...” [5] (page 84) 
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Assume that the scientists attendant upon an Alife experiment are just two: 
a system designer and a system observer (both of whom can in fact be one and 
the same) , and that the following three conditions hold: 

(i) Design. The system has been constructed by the designer, by describing 
local elementary interactions between components (e.g, artificial creatures 
and elements of the environment) in a language £i. 

(ii) Observation. The observer is fully aware of the design, but describes global 
behaviors and properties of the running system, over a period of time, using 
a language £2 • 

(iii) Surprise. The language of design £1 and the language of observation £2 
are distinct, and the causal link between the elementary interactions pro- 
grammed in £1 and the behaviors observed in £2 is non-obvious to the 
observer — who therefore experiences surprise. In other words, there is a cog- 
nitive dissonance between the observer’s mental image of the system’s design 
stated in £1 and his contemporaneous observation of the system’s behavior 
stated in £2. 



The above three clauses relating design, observation, and surprise describe 
our conditions for diagnosing emergence, i.e., for accepting that a system is 
displaying emergent behavior. 

When assessing the surprise clause of our test one should bear in mind that as 
human beings we are quite easily surprised (as any novice magician will attest). 
The question reposes rather on how evanescent the surprise effect is, i.e., how 
easy (or strenuous) it is for the observer to bridge the C1-C2 gap, thus reconciling 
his global view of the system with his awareness of the underlying elementary 
interactions. One can draw an analogy with the concept of intelligence and the 
Turing test: the chatty terminal might at first appear to be carrying on like an 
intelligent interlocutor, only to lose its “intelligence certificate” once the tester 
has pondered upon the true nature of the ongoing conversation. 

Some of the above points deserve further elaboration, or indeed invite debate. 
Before treating these issues in Section 4 , we wish to demonstrate the application 
of our test to four cases. 



3 Administering the emergence test: Four case studies 

In this section we administer the emergence test to four examples, thus demon- 
strating its application (additional examples are given in [ 10 ]). Each exam- 
ple ends with a “test score,” constituting our own assertion as observers of 
whether we are indeed surprised, that is, of whether emergent behavior is indeed 
displayed — or not. 
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1. Emergence of a nest structure in a simulated wasp colony, from the 

interactions taking place between individual wasps [13]. 

(i) The design language Ci is that of local wasp interactions, including move- 
ment on a three-dimensional cubic lattice and placement of bricks. A wasp’s 
decision is based upon a local configuration of bricks, which he in its “visual” 
field. Actions to be taken are prewired under the form of a lookup table with 
as many entries as there are stimulating configurations. 

(ii) The observation language £2 is that of large-scale geometry, as employed 
to describe nest architectures. 

(iii) While fully aware of the underlying wasp interaction rules, the observer 
nonetheless marvels at the sophistication of the constructions and their strik- 
ing similarity to naturally occurring nests. 

Diagnosis: emergent behavior is displayed by the nest-building wasps. 

2. Emergence of a “highway” created by the artifi.cial Langton ant, 

from simple movement rules [12]. 

(i) The design language C\ is that of single moves of a simple, myopic ant. The 
ant starts out on the central cell of a two-dimensional, rectangular lattice, 
heading in some selected direction. It moves one cell in that direction and 
looks at the color of the cell it lands on — black or white. If it lands on a 
black cell, it paints it white and turns 90 degrees to the left; if it lands on a 
white cell, it paints it black and turns 90 degrees to the right. These simple 
rules are iterated indefinitely. 

(ii) The observation language £2 is that of global behavioral patterns, extended 
over time and space (i.e., tens of thousands of single ant moves, spanning 
thousands of cells). Specifically, the ant was observed to construct a “high- 
way,” i.e., a repeating pattern of fixed width that extends indefinitely in a 
specific direction (Figure la). 

(iii) While fully aware of the very simple ant rules, the observer is nonetheless 
surprised by the appearance of a highway. 

Diagnosis: emergent behavior is displayed by the highway-constructing ant. 



3. Emergence of flocking behavior in simulated birds, from a set of 

three simple steering behaviors [9]. 

(i) The design language C\ is that of local bird interactions, the three rules 
being: separation: steer to avoid crowding local flockmates; alignment: steer 
toward the average heading of local flockmates; cohesion: steer to move to- 
ward the average position of local flockmates. A bird’s decision is based upon 
its nearby neighbors, i.e., those that are in its “visual” field. 

(ii) The observation language £2 is that of flocking behaviors, such as the flock’s 
parting smoothly when faced with an obstacle, and “flowing” around it — to 
then reunite again (Figure lb). 

(iii) While fully aware of the underlying bird interaction rules, the observer 
nonetheless marvels at the lifelike flocking behaviors. 
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Pig. 1. Examples of emergence, (a) The trail created by the highway-constructing, 
Langton ant. (b) A flock of simulated birds parts smoothly when faced with an obstacle, 
and “flows” around it — to then reunite again (after Reynolds [9]). 



Diagnosis: the flocking behavior exhibited by the artificial birds was considered 
a clear case of emergence when it was first reported upon in 1987. However, 
one could now maintain that it no longer passes the emergence test, since 
wide-spread use of this technique in computer graphics has obviated the ele- 
ment of surprise. This example demonstrates that the diagnosis of emergence 
is contingent upon the sophistication of the observer. 

4. Emergence of wall-following behavior in an autonomous, mobile 

robot, from the simultaneous operation of two simple behavior sys- 
tems: obstacle avoidance and wall seeking [11]. 

(i) The design language C\ is that of simple robot behaviors, including — in this 
case — obstacle avoidance and wall seeking. 

(ii) The observation language £2 is that of more elaborate robot behaviors, 
consisting — in this case — of wall following. 

(iii) Steels wrote that “Wall following is emergent in this case because the cat- 
egory ‘equidistance to the (left/right) wall’ is not explicitly sensed by the 
robot or causally used in one of the controlling behavior systems.” [11] (page 
92) 

Diagnosis: Steels diagnosed emergence in this case as it accords with his own 
definition, namely, that a behavior is emergent if it necessitates the use 
of new descriptive categories that axe not needed to describe the behavior 
of the constituent components [11]. While thus alluding to the language di- 
chotomy rendered explicit by our definition (i.e., the existence of two distinct 
languages — that of design and that of observation), we maintain that the 
surprise element is missing: the wall-following behavior can be quite readily 
deduced by an observer aware of the two underlying simpler behaviors. We 
thus conclude that emergent behavior is not displayed by the wall-following 
robot. 
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4 Discussion 

We now discuss the various components of our test. 

The operant nature of the test. In this we have drawn our inspiration from 
Turing, who — concerning intelligence— -opted for an operant, informal, “social” 
definition, deliberately eschewing rigor. Turing’s definition still serves the AI 
community well— almost half a century after its publication [14]. 

Emergence as a property of artificial systems. In his book Emergence: 
Fi-om Chaos to Order, Holland wrote that “Emergence occurs in systems that 
are generated” [7](page 225). Reviewing Holland’s book, Mallot also opined that 
“In this context [the construction of artificial systems], the problem of emer- 
gence may actually be a genuine one.” [8] These views accord with our owm 
view, namely, that the diagnosis of emergence should be considered (and hence 
our test applied) within domains such as Alife, which are inherently construc- 
tive endeavors. This view naturally gives rise to clause (i) of our test, i.e., the 
existence of a designer — and of a design language. 

The existence of an observer. Artificial systems are constructed to be 
beheld— one does not usually build one’s system, to then walk away nonchalantly 
without ever looking back. Hence, there exists an observer ipso facto (who need 
not necessarily be the constructor himself), a fundamental aspect which has not 
escaped researchers in the field. In a paper discussing emergence and artificial 
life, Cariani wrote that “The interesting emergent events that involve artificial 
life simulations reside not in the simulations themselves, but in the ways that 
they change the way we think and interact with the world.” [2] (page 790) He 
goes on to say that “computer simulations are catalysts for emergent processes 
in our own minds...” [2](page 790) 

Another author, Emmeche, in an introductory monograph on artificial life, 
examines the case for emergence “in the eye of the beholder.” [4](page 145) 
Also, Crutchfield, in an article devoted to the subject of emergence, asks: “But 
for whom has the emergence occurred? More particularly, to whom are the 
emergent features ‘new’?... The newness in both cases is in the eye of an ob- 
server...” [3](page 517) 

Holland brings up the issue of the observer circuitously, when writing that 
“The whole is more than the saw of the parts in these generated systems... Said 
another way, there are regularities in system behavior that are not revealed by 
direct inspection of the laws satisfied by the components.” [7] (page 225) One 
may ask direct inspection by whom? Why, by the observer of course!^ Clearly, 
the existence of an observer is a sine qua non for the issue of emergence to arise 
at all. 

^ Holland also cites a passage from Gell-Mann’s book The Quark and the Jaguar [6], 
which also brings up indirectly the role of the observer: “In an astonishing vari- 
ety of contexts, apparently complex structures or behaviors emerge from systems 
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Surprise. By bringing the observer’s emotion of surprise into play, our emer- 
gence test widens the focal beam of discussion, now shining both on the system’s 
behavior as well as on the experimenter and her internalized expectations. This 
relates to Cariani’s nutshell description of emergence relative to a model as 
“the deviation of the behavior of a physical system from an observer’s model 
of it.” [2] (page 779) An author subscribing to said deviation-from-model view 
would wish to document her a priori expectations before diagnosing emergence 
and abandoning attempts at explanation. Our emergence test might then be 
reformulated as Design (Expectations), Observation, Surprise. 

To summcurize, the three clauses of our emergent test are grounded in previous 
work: the design clause expresses our wish to restrict the test to artificially 
constructed systems; the observation clause reflects the necessity of there being 
an observer for emergence to arise at all; and the surprise clause embodies both 
the deliberation and the emotion implied by human judgments of value. 
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Abstract. 1 will be discussing the plausibility of Mark Bedau’s supple 
adaptation view of life. After presenting the essential aspects of the view, I 
introduce a hypothetical system of romance novels and romance novel authors, 
which captures those aspects of a system which are necessary for life under 
Bedau’s view. Using this romance novel system as a starting point, I discuss 
other intuitively non-living systems, like economies and scientific literature, 
which would be considered alive by Bedau’s standards. Finally, I present one 
objection to Bedau’s view and suggest how his view might change in order to 
become more widely accepted. 



1 Introduction 

Bedau [2] has argued that individual biological organisms are not alive, but that the 
supplely adaptive systems which produce these organisms are. This implies that some 
intuitively non-living supplely adaptive systems, like economies and scientific 
literature, are alive. I will present a view of romance novels under which they would 
be supplely adaptive and thus alive. Using this simple example, I will investigate 
some of the central questions arising from the supple adaptation view in an attempt to 
determine what proponents of the view must address for the view to gain significant 
acceptance. Finally, I will present an objection to the theory, which seems to be the 
problem around which many of these central questions revolve. 



1.1 The Supple Adaptation View 

Bedau asserts that “...supple adaptation does not merely produce living entities.... 
[but that] the entity that is living in the primary sense of that term is the supplely 
adapting system itself...” [2]. Supple adaptability is defined as “...the way in which 
evolution automatically fashions and refashions complex intelligent strategies for 
flourishing as local contexts change” [1]. A living organism is not alive solely by 
virtue of its own characteristics, but because it is part of a larger system that shows 
continual adaptive iimovation [2]. 

Supple adaptability itself can be quantified by measuring the prevalence of a trait 
over time. If a trait has adaptive significance, it will persist in the gene pool much 
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longer than if it has no significant effect, or a detrimental effect, on fitness. The null 
model for this judgement would have the same mutation rates as the real system, but 
in the null model the genotype would have no effect on survival. If a system has traits 
that persist longer than they would in a null model, it is engaging in adaptive 
evolution [3]. Artificial life models either never engage in adaptive evolution, or do so 
only until they solve the problem presented by their environment. New traits persist 
only as long as they would in the null model. The simulation stops evolving [4]. 

Though this seems to be an interesting statistical method for analyzing 
evolutionary dynamics, the claim that it is the defining characteristic of life is much 
more opaque. In Farmer and Belin’s view, “[t]here seems to be no single property that 
characterizes life. Any property that we assign to life is either too broad... or too 
specific...” [6]. Bedau argues that our intuitive feelings about life vary from time to 
time and culture to culture, and, as such, we should not look for one property, like 
metabolism [1 1], that explains everything that we intuitively feel to be alive, but look 
for a property that really is life. In taking this approach, Bedau is looking for 
something that essentially produces living phenomenon [1]. 

He offers the following definition: 

A. X is living iff x is living, or x is livingj. 

B. X is living, iff x is a system undergoing supple adaptation. 

C. X is livingj iff there is a some living, system y such that either (1) x meets 
condition A, and y meets condition B, and x bears relation C, to y or (2) x meets 
condition Aj and y meets condition Bj and x bears relation C 2 to y, or . . . or (n) x 
meets condition A„ and y meets condition B„ and x bears relation C„ to y. [2] 

This definition may solve many problems of life [1], but it implies that human beings 
are not fully alive, but only alive in a secondary sense, while the biosphere itself is 
fully alive. The definition also suggests that “...clay crystallites, autocatalytic 
networks of chemicals, and even human intellectual and economic systems ... also 
deserve to be thought of as ‘living’” [2]. But, all of these things are intuitively non- 
living. 

While this seems difficult for many to accept, focusing on life as a property of both 
a system and an individual may be promising, as it will eliminate the need to look at 
life as a collection of properties rather than a single defining property. This critique is 
intended to motivate more work on developing a less offensive version of this view, 
which will combine philosophical understanding with Bedau’s statistics to develop a 
powerful and useful definition of life. 



1.2 The Romance Novel System 

In order to simplify my discussion of intuitively non-living supplely adaptive 
systems, I shall present a hypothetical system of romance novels that is intuitively 
non-living but supplely adaptive. The hypothetical system is simpler, and thus easier 
to discuss, than economies or scientific literature. Assume that there is a population of 
romance novel writers, and, because of the huge popularity of romance novels, the 
majority of romance novel readers are also romance novel authors. All of the authors 
are given unlimited artistic freedom; they can write whatever sort of romance novel 
they want, and it will be published. Because the writers are reading each other’s 
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novels, they are able to incorporate the good aspects of other authors’s work into their 
own, and, over time, the romance novels as a population will improve in quality. 
These romance novels would have an emergent quality, because no novel would be 
able to describe the system, and it would not be meaningful to discuss the entire 
system in terms of an individual novel. Rather, aspects of the novels, like length, plot, 
target audience, and relationship, would change as the system evolved. 

One could easily investigate such a system using the framework of Bedau and 
Packard [3]. Aspects of the novels, as well as specific plot devices, could be tagged as 
traits and then traced using the Bedau and Packard method across the population over 
time. Because the authors are using these traits for some purpose and, quite probably, 
saw them first in another novel, these traits would probably form statistically 
significant activity waves, and, as such, the system would be supplely adaptive. 



2 Issues Raised by the Romance Novel System 

Intuitively, the romance novel system does not seem to be alive, even though it is 
supplely adaptive. Thus, we must ask why supple adaptation should be viewed as life 
when it characterizes intuitively non-living systems as living. Supple adaptation faces 
a critique similar to Gould and Lewontin’s [7] “just so story” critique of the 
adaptationist program. Furthermore, supplely adaptive systems should be tested 
against commonly held hallmarks of life [6, 10]. Even if a list of hallmarks does not 
provide necessary and sufficient conditions, it does seem to be a good way to test a 
definition of life. 



2.1 Identifying Traits and Developing a Null Model 

Gould and Lewontin [7] charged that there is more than one way for something to 
develop traits, and that the adaptationist program is flawed because “...the rejection 
of one adaptive story usually leads to another...”, and one can always “concoct a 
plausible story.” Supple adaptation may be similarly flawed. Even though supple 
adaptation is statistically grounded, the information that these statistics are applied to 
is highly subjective. In all real-world cases, like the analysis of the fossil in Bedau et 
al. [4], the researcher must develop the null model. In the aforementioned case, for 
example, it is assumed that any activity that appears in the fossil record at the family 
level is adaptively significant. Like Gould and Lowontin’s adaptationists, Bedau et al. 
are choosing what to regard as adaptively significant. They choose how to create the 
null model and what constitutes an attribute. Even in intuitively living organisms this 
is a serious issue. Given the amount of junk D.N.A. in biological creatures, a single 
gene is too small to be an attribute and a specific species may be too large. In 
intuitively non-living systems, defining attributes is particularly difficult. What kind 
of change in a romance novel would count as the creation of a new trait, and how 
should tliese traits be differentiated, given the interconnectedness of different aspects 
of the novel (e.g. plot and length seem inherently related). Such questions are also 
difficult for an economy, as differentiating economic units is, in and of itself, difficult. 
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2.2 Our Existing Notion of Life 

How well do supplely adaptive systems fit with our preconceptions about life? 
Mayr [10] and Farmer and Belin [6] have both proposed lists of features that we 
intuitively believe to be present in most living things. Farmer and Belin’s list is as 
follows; 

Lifeisapatteminspacetime, [s]elf-reproduction. ...[,] storage of a self- 
representation.... [, a] metabolism... [, fjunctional interaction with the 
environment.... [, ijnterdependence of parts.... [, sjtability under 
perturbations. . ..[, and t]he ability to evolve. [6] 

In a limited sense, intuitively non-living supplely adaptive systems, like my 
hypothetical romance novel system, meet these criteria. They appear as a pattern in 
spacetime and engage in self-reproduction. To whatever extent they are subject to 
some version of the second law of thermo-dynamics, they have a metabolism. They 
do not seem to store any sort of self-representation, but, because their phenotypic and 
genotypic levels are the same, as they are both the words of the novel, this appears 
unnecessary. They interact with their environment. Romance novel writers, for 
example, might see changes in society and begin producing new romance novels in 
anticipation of these changes. Lastly, they have interdependence of parts, stability 
under perturbation, and the ability to evolve. 

Many of these properties, however, are the result of the fact that the system is 
produced by living organisms and not elements of the system itself. In the 
hypothetical romance novel system, for example, self-reproduction, metabolism, and 
information storage seem to be a function of the biological organisms that write the 
novels and not the system itself Moreover, interaction with the environment is 
entirely a result of the predictions of the organisms which form the system, and not of 
the system itself Thus, advocates of supple adaptation must explain why it is 
meaningful to discuss intuitively non-living systems as independently living, rather 
than the sum of their living parts. 



3 The Concrete Room Objection 

The importance of how the actors in a system are related to the theater in which 
they interact is illustrated by the following example. Consider taking a small, 
functioning ecosystem and placing it in a concrete room. It w'ill continue to live, and it 
will probably evolve to the selective pressures of being in a concrete room, and, as 
such, become a long run supplely adaptive system in its own right. Nevertheless, the 
concrete room has not undergone any supple adaptation. The organisms might even 
produce their own concrete, thereby significantly changing the concrete room, still, 
the concrete room is not evolving and is not alive. 

The romance novel system plays the same role as the concrete room. Even though 
the writers are producing a definite product, the features of which can be identified 
without reference to the authors, this product is merely a new way for the biological 
entities producing it to interact. The system as a whole is not clearly different from 
the entities that constitute it. The system undergoes evolution, but that evolution is 
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merely an artifact of the interactions of the entities which constitute it. Until the 
proponents of supple adaptability can show that the romance novel system is uniquely 
alive, and, as such, gives a sort of life to the romance novels which constitute it, or 
move in the direction of this critique and explain why it is not alive, supple 
adaptability will not be able to gain widespread acceptance. 



4 Conclusions 

Though supple adaptation seems, in many ways, a useful way of explaining life, its 
usefulness will remain limited until researchers directly address the issue of 
intuitively non-living supplely adaptive systems. Though Bedau claims that: “It does 
not matter whether this theory supports our current preconceptions about life or fits 
the current meaning of our word ‘life’” [1], he has yet to directly address our 
preconceptions about life. As long as this is not dealt with, supple adaptation will 
remain difficult to accept. 

Nevertheless, supple adaptation, especially as modified by this critique, can help 
other views of life, as well as potentially become a viable view in its own right. 
Hopefully, this work will help supple adaptation overcome its shortcomings and show 
other researchers interested in defining life the value of addressing both the system 
and the individuals who compose it. 
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Abstract. Hinton and Nowlan have demonstrated a model of how lifetime 
plasticity can guide evolution. They show how acquired traits change the shape of 
the reward landscape in which subsequent genetic variation takes place, and in so 
doing encourage the discovery of equivalent heritable traits. This enables the 
seemingly Lamarkian inheritance of acquired characteristics without the direct 
transfer of information from the phenotype to the genotype. This paper draws 
direct inspiration from their work to illustrate a different phenomenon. We 
demonstrate how the formation of symbiotic relationships in an ecosystem can 
guide the course of subsequent genetic variation. This phenomenon can be 
described as two phases: First, symbiotic groups find solutions where individual 
organisms cannot, simply because lifetime interaction produces new 
combinations of abilities more rapidly than the relatively slow genetic variation 
of individuals. Second, these symbiotic groups subsequently change the shape of 
the reward landscape for evolution, providing a gradient that guides genetic 
variation to the same solution. Ultimately, an individual organism exhibits the 
capabilities formerly exhibited by the group. This process enables the 
combination of characteristics from organisms of distinct species without direct 
transfer of genetic information. 



1 Introduction 

Symbiosis, in its general definition, is simply the living together of different organisms. 
Often, in lay usage, the term is used to refer to the special case of mutualism where 
symbionts (organisms in symbiotic relationship) are mutually beneficial. Despite being 
undeniably common, the phenomenon of symbiosis, and especially mutualism, has for 
the most part been treated as a curio; a transient aberration on the otherwise relentless 
path of mutually-exclusive competition between species. In contrast, enlightened 
evolutionary theory recognises symbiosis as an integral process, and a fundamental 
source of innovation, in evolution. In its strongest form, symbiosis can lead to 
symbiogenesis: the genesis of new species via the genetic integration of symbionts [12], 
[9], [8], [10]. For example, eukaryote cells (from which all plants and animals are 
descended) have a symbiogenic origin [10]. 

The genetic integration of pre-adapted organisms is a fundamentally different source 
of innovation from the Darwinian accumulation of random variations. Computational 
abstractions of random variations under differential selection are well established in 
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evolutionary algorithms research. However, functional models for the interaction 
between the formation of symbiotic groups and the accumulation of genetic variation 
are under-researched. In this paper we make a modest start by modeling one mechanism 
by which symbiosis and genetic variation can interact. We do not pretend to model the 
biological details involved in any way, rather we provide an abstract computational 
model examining the dynamics of the adaptive mechanisms. And in this paper we 
commence with a model of an indirect mechanism whereby symbiosis can enable 
innovation, rather than the more radical process of symbiogenesis where symbionts are 
genetically integrated directly. 

The mechanism behind our model is directly inspired by the work of Hinton and 
Nowlan in their 1987 paper [6], “How Learning Can Guide Evolution”. Their paper 
demonstrates the Baldwin effect [1]; a phenomenon whereby acquired characteristics 
can induce equivalent heritable characteristics. This seems like Lamarkian inheritance 
of acquired characteristics but it occurs without direct transfer of information from the 
phenotype to the genotype. 

Hinton and Nowlan provide a simple and elegant abstract model that exemplifies 
how this process can occur and their model has been replicated and extended many 
times [2], [5], [11]. Here we have adapted their model by replacing learning with 
symbiosis; or more generally, replacing lifetime plasticity of an organism with lifetime 
interaction between organisms. Their experimental setup provides a convenient starting 
point and, moreover, an interesting comparison that assists us in understanding a more 
general concept that encompasses both phenomena. 

Using this model we show that lifetime interaction can enable the evolution of 
organisms that would otherwise be unobtainable — or at least, would be very unlikely to 
occur. Our simulation of this phenomenon reveals two phases. First, symbiotic groups 
find the solution to a problem (a set of abilities that confers high reproductive fitness) 
more quickly than the solution can be found by a single organism. This occurs simply 
because the combination of abilities via lifetime interaction of organisms samples a 
much larger set of variations than the relatively slow genetic variation from mutation. 
The first stage alone does not demonstrate the evolution of an organism that would 
otherwise not occur — rather we have simply selected a mutually beneficial group of 
organisms out of those that do occur. 

In the second phase, after a group has found the solution and an ecosystem of 
mutually beneficial organisms has become established, the evolution of the individual 
organisms therein operates in a different environment. Where previously an organism 
that exhibited some fraction of the necessary abilities, but not all the necessary abilities, 
would fail, now symbionts will occasionally fill-in for this organism’s inadequacies. 
Moreover, the greater the fraction of necessary abilities it exhibits the less filling-in is 
required — i.e. the less it depends on symbionts and the more reliably successful it is. 
This provides a gradient to guide genetic search toward an organism that can ultimately 
perform independently. Without the support of symbionts this gradient does not arise. 

Thus, the abilities discovered by the symbiotic group become encapsulated in the 
heritable traits of a single individual. Yet this effect occurs without the exchange of 
genes — the symbionts may be distinct species. We call this effect symbiotic scaffolding: 
the symbionts support each other as partially able organisms, and enable the gradual 
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accumulation of abilities, until ultimately, when their abilities are complete, the 
scaffolding is not required. This mechanism is illustrated in the following simulations. 

The remaining sections of this paper are organised as follows. In Section 2 Hinton 
and Nowlan’s learning scenario is adapted to our symbiosis model. Section 3 details the 
experimental setup and gives results. In section 4 we indicate a general principle that 
encompasses both the Baldwin effect and symbiotic scaffolding, and suggest 
implications for evolutionary computation methods. Section 5 concludes. 



2 A scenario for symbiosis 

Following Hinton and Nowlan’s lead we describe an extreme and simple scenario 
where the combinatorics of the phenomenon are clear. We consider a problem that 
consists of a large number of variables all of which must be correctly specified by an 
organism in order for that organism to receive any reproductive fitness. In such cases an 
organism that is partially correct, even one that specifies all but one of the variables 
correctly, is not rewarded at all. This worst-case scenario is the extreme case of 
irreducible complexity, in which solutions can only be found by trying possibilities at 
random. In Hinton and Nowlan’s learning model the problem is to find the 20 correct 
connections for a neural network. For our symbiosis model we may imagine a chemical 
cycle with 20 steps. Each of the 20 steps must be catalysed by an organism correctly in 
order to get the chemical cycle going and to thereby confer reproductive fitness. 

Where Hinton and Nowlan introduce lifetime plasticity to guide genetic search in 
this unforgiving landscape we shall use lifetime interaction with other organisms. 



The interaction of organisms 

Hinton and Nowlan suppose that some of the connections in the neural network will be 
left genetically unspecified and replaced with a switch that can make or break the 
connection during the lifetime of the organism. Here we suppose that an organism may 
have a neutral effect on a step in the chemical process, that is, the organism will neither 
prohibit nor catalyse the chemical step, and that this step may (or may not) be 
completed by some other organism in the environment. That is, an organism can gain 
the benefit (or penalty) of chemical byproducts created by the processes of other 
organisms in the ecosystem (for those steps where the organism itself is neutral). 

We may crudely represent the traits of an organism in 20 genes where each gene has 
three alleles: correct, incorrect or neutral corresponding to catalytic, prohibitive 
(preventing completion of the cycle), or neutral interaction with a step in the chemical 
cycle. This unrealistic simplification enables us to see the mechanisms of interest more 
clearly but it is not integral to the results that follow. 

Thus far we have followed Hinton and Nowlan’s model exactly except with a 
different metaphor. Where they used correct connections, absent connections and 
plastic connections for a neural network we use catalytic, prohibitive and neutral 
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influence on steps of a chemical cycle. Now, where Hinton and Nowlan use lifetime 
learning to specify the plastic connections of the network we will substitute lifetime 
interaction between organisms to fill-in for the missing abilities of the cycle. 

In the same way that Hinton and Nowlan use random search as their learning model 
because it makes the least assumptions necessary, we use the happenstance co-location 
of organisms to determine their symbiotic interaction since it makes the minimal 
assumptions. The use of a more sophisticated model of symbiotic relationship-forming 
will illustrate the scaffolding effect more strongly— we stress that our model of 
organism interaction is deliberately trivial so as to prevent details from obscuring the 
essence of the effect. In our model we may imagine that organisms are randomly 
distributed in the environment and perpetually mixed. At any one instant there will be 
some number of other organisms in the immediate vicinity of the organism in question. 
Thus every organism is tested by combining its abilities with those of several other 
randomly selected organisms. Fig. 1 shows how the abilities of organisms are 
combined. 



fifth organism: 
fourth organism: 
third organism: 
second organism: 
first organism: 



--111 0101 -- 0 - 0 -- 

- 01 --- 0 - 0 10 - 0-11 

1 -- 10 - 11-0 10— -1 

- 1 - 0 - 1 - 001 - 01 - 1 -- 0 -- 

00 - 0 - 11 --- 01 - 1 -- 0--1 



combined abilities: 00100110010111100011 



Fig. 1. Combining the abilities of organisms. The 20 genes of each organism may take one of 
three alleles: correct, incorrect or neutral shown as 1, 0 and respectively. Notice that the 
traits of the first organism take priority over all others; for consistency, the traits of the second 
organism take priority over all but the first, and so on. Since every trait is specified by at least 
one of the first 4 organisms in this example, the fifth organism shown here is redundant. 



Since the selection of, and ordering of, the organisms will be random, the details of 
this mechanism are largely inconsequential to the result that follows. One important 
feature, however, is that the fitness of the combined traits will be awarded to the first 
organism only, and that the traits of the first organism are not over-ruled by any other. 
However, since the first organism will likely fill-in for other organisms in their turn, 
this asymmetry is reciprocated. Alternate models of interaction and reward distribution 
may be equally valid — however, the current model is sufficient for our purposes. 



Evaluation 

A key feature of the Hinton and Nowlan model is the fact that lifetime plasticity can 
search combinations of traits far more rapidly than genetic variation. They allow 
random search to test 1000 combinations of the plastic connections during the lifetime 
of one individual. Similarly, we allow the combination of abilities via lifetime 
interaction of organisms to be much more rapid than genetic variation. Accordingly we 
test 1000 random groups during the lifetime of an organism. 

So, to evaluate an individual we divide its lifetime into 1000 time-steps. At each 
time-step a number of other organisms are selected at random from the ecosystem. The 
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number of individuals that are picked may be different for each time-step and the 
probable number is varied in the experiments that follow. The abilities of these 
organisms fill-in for the missing traits of the organism being evaluated, as described in 
Fig. 1. If the combined abilities of the group of organisms exhibits all required abilities 
correctly (and therefore enable the entire chemical cycle), then the organism receives a 
fitness increment, otherwise its fitness is unaffected. This is repeated for all 1000 time 
steps with a new randomly selected group each time. Overall, the fitness of an organism 
is given by f=\+n, where n is the number of time-steps when the organism in question 
forms a successful group with the organisms in its vicinity.' 



3 Experiments 

The genetic model, the method of interaction, and the evaluation described above are 
iterated in a genetic algorithm (GA) [7]. Hinton and Nowlan choose the population size, 
number of lifetime trials, and number of variables in the problem carefully so as to 
make it most unlikely that genetic variation alone will find the solution but very likely 
that lifetime variation will. We continue to follow the experimental parameters of 
Hinton and Nowlan where applicable for the same reasons. We use a population of 
1000 individuals, or in our symbiosis terminology, an ecosystem of 1000 organisms. 
Fitness-proportionate reproduction is applied generationally [7], 

Since we are interested in the ability of symbiotic scaffolding to encapsulate the 
abilities of symbionts of different species, we will not use genetic recombination 
(crossover) in our main experiments. We shall return to this point below. Instead we 
will use mutation as our only source of genetic variation. Mutation is applied with a 
bitwise probability of 0.05 of assigning a new random value. New values are randomly 
selected to be correct, incorrect or neutral genes with probability 0.25, 0.25, 0.5 
respectively. These same proportions are used to construct the initial population. Hence 
in generation 0 each organism will have about 50% neutral genes (as shown in Fig. 1). 

Parts of the results that follow are largely predictable given the experimental setup — 
as in Hinton and Nowlan ’s experiments the results are used to confirm our intuitions. 

Experiment 1: A Crowded Ecosystem 

In our first experiment we assume that the ecosystem is crowded. At any time-step there 
will be more than enough organisms in the immediate vicinity of an organism to fill-in 
all its neutral abilities. Naturally, when the organisms contain random genes they 
probably will not fill-in the steps in the cycle correctly. And, of course, if the organism 



' In this detail we differ from the fitness function used by Hinton and Nowlan which is 
1+19/j/lOOO, where n is the number of time steps remaining after the first successful lifetime 
trial. Our method gives the expected fimess of an individual to be directly proportional to the 
probability of success in a single trial whereas the Hinton and Nowlan model gives an 
expected fitness which is highly non-linear with respect to this probability [Harvey 1993]. 




34 



being evaluated has any incorrect genes then it cannot form any successful groups. We 
build groups by accumulating organisms at random until all 20 abilities are specified 
(one way or the other). Since each organism specifies approximately half the full set of 
genes, the average number of organisms required to specify all traits is generally small. 

Fig. 2 shows the number of each allele per organism averaged over all organisms in 
the ecosystem at each generation. We see that the proportion of alleles at the start of 
the experiment is as per the mutation probabilities, i.e. approximately 0.25,0.25,0.5 for 
correct, incorrect and neutral respectively. Around the 20th generation a quite dramatic 
change takes place: the proportion of incorrect alleles falls close to zero whilst the 
number of correct alleles rises. ^ This is the point where symbiotic organisms become 
established and incorrect alleles are purged from the gene pool. 



average number of genes of each allele per organism 




Fig. 2. Experiment 1. Number of genes of 
each allele per organism, averaged over all 
1000 organisms at each generation. The 
first 200 generations are shown. 
Insignificant variation in the proportion of 
alleles occurs over subsequent generations 
(up to 1000). 



number of organisms per successful group 




Fig. 3. Average and minimum group size of 
successful groups, experiment 1 . Where no 
successful group is formed (as in some of 
the first few generations) both the average 
and minimum group sizes are shown as 
zero. 



The effects of lifetime interaction shown in Fig. 2 have some qualitative similarity 
with the results of Hinton and Nowlan on the Baldwin effect. This is no coincidence; 
the substitution of lifetime interaction for lifetime plasticity is algorithmically 
insignificant in the initial stages where the abilities provided by other organisms are 
essentially random. However, once selection has taken hold and the population is 
predominantly devoid of incorrect alleles, the nature of the symbiotic variation is quite 
different from that of the random assignment used by Hinton and Nowlan. Specifically, 
the alleles supplied by symbionts are nearly all correct (whereas the traits supplied by 
random search remain in the original proportions). This means that the selection 
pressure for specifying increasing numbers of correct genes is very weak — in nearly all 
time-steps, other members of the group will correctly supply the required abilities. The 



^ Here, and more so in experiment two, the exact generation at which these sharp changes occur 
varies from run to mn due to the stochastic nature of the experiment. However, the magnitude 
and general shape of the phenomena are reliable. 
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only significant pressure on an organism is that it should not specify any incorrect 
genes. A consequence of this complacency is that there is no pressure to get all correct 
abilities. Though Hinton and Nowlan’s original work also shows no significant trend in 
decreasing neutral traits, replications of their work have shown subsequent increase in 
correct alleles over time-scales up to 500 generations [2], [5], In this respect the 
influence of symbiotic groups is different from that of lifetime learning. 

In Fig. 3 we see that the dramatic changes in Fig. 2 coincide with the establishment 
of groups that solve the complete cycle. We also see that the average size of successful 
groups increases dramatically. Given that organisms have approximately 50% of the 
correct alleles on average at this stage in the experiment, the fact that large groups are 
required to complete the set of abilities implies that many organisms have neutral 
alleles at the same loci. Informal observation also indicates that the population is 
somewhat converged at this time. By about the 50th generation the average group size 
falls to about six members which, given that the proportion of correct alleles has not 
changed significantly, shows that some complementary abilities have become 
established in the population. The smallest successful groups are of size two. 

In summary, the crowded ecosystem in the first experiment shows the establishment 
of symbiotic organisms but does not demonstrate the entire scaffolding effect. We do 
not see the subsequent guidance of genetic variation to find an individual organism with 
the abilities formerly exhibited by the group. 



Experiment 2: A Sparse Ecosystem 

In our second experiment, we suppose the ecosystem is sparsely inhabited. Thus, the 
number of organisms in the immediate vicinity of the organism being tested is limited. 
Implementationally, we limit group sizes probabilistically where the limit is randomly 
selected from an exponential distribution. Specifically, the probability of there being 
exactly k members in a group is 2*, k > 1 . In this way it is most likely that an organism 
will be evaluated on its own; next most likely it will be evaluated with one other 
organism, and so on. In short, an organism cannot rely on the availability of symbionts, 
and an organism that is more self-sufficient will receive a higher fitness on average. 

In Fig. 4 we see the same dramatic trends seen around generation 20 in the first 
experiment, though here they occur later. But, we also see some quite different 
phenomenon thereafter. Unlike the first experiment we see a clear upward trend in the 
number of correct alleles in subsequent generations, and a significant increase starting 
at around 160 generations. Whereas the first sharp increase corresponds to the purging 
of incorrect alleles, the second corresponds to the purging of neutral alleles. 

Examining the plots of successful-group size averages in Fig. 5 we see that, although 
there are a few instances of successful groups in the first 70 generations, the occurrence 
of successful groups takes longer to become established than in experiment one. This is 
reasonable, since in experiment two an organism has less interaction with other 
organisms and therefore fortuitous co-location is more sporadic. We also see that the 
average number of individuals per group does not escalate as acutely as observed under 
the evaluation scheme of the first experiment. Finally, we see that the rise in proportion 
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of correct alleles at 160 generations corresponds to the occurrence of individuals that 
are self-sufficient; i.e. a minimum group-size of 1 becomes established. In other 
independent runs there is considerable variation in the exact generation where 
symbiotic organisms become established and the generation which exhibits the first 
self-sufficient individual. Nevertheless, the overall effect is reliable. 



average number of genes of each allele per organism 




Fig. 4 . Experiment 2. The average number 
of each allele per organism. All parameters 
as per the first experiment except that the 
size of groups is limited probabilistically. 



number of organisms per successful group 




Fig. 5. Number of individuals per 
successful group in experiment 2. 



The difference between experiments one and two is that the high availability of 
symbionts in the first experiment produces complacency whereas the unreliable 
availability of symbionts in the second experiment provides a selection pressure 
favouring independent organisms. Thus in experiment two we witness the complete 
scaffolding effect: symbionts first enable the adaptive characteristics, then become 
obsolete. They have shaped the evolutionary search space so as to enable the evolution 
of organisms that perform the function formerly performed only by groups. ^ 



4 Discussion 

We can see in these results the emergence of a general principle that encompasses both 
symbiotic scaffolding and the Baldwin effect— specifically, a process of rapid variation 
guiding a process of slow variation. Either symbiosis or learning may guide subsequent 
genetic mutation. Both mechanisms are effective because their fast variation discovers a 



^ Control experiments, not shown, included genetic recombination (one-point crossover) in 
addition to mutation. The results were essentially similar, though some effects were 
exaggerated probably due to stronger genetic drift [Harvey 1993], It should also be noted that 
the recombination of genes via crossover is quite distinct from the filling-in mechanism of 
lifetime interaction shown in Fig. 1; this distinction is to be expected in biological organisms 
also. Additionally, as expected, experiments run without lifetime interaction (with or without 
crossover) do not succeed in 1000 generations. 
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solution and then provides a gradient that guides genetic variation to the same solution. 

From the biological point of view, the influence of symbiosis on evolution, whether 
indirect as in these models, or direct as in symbiogenesis, is of interest because it 
informs us of the origin of organisms we see in nature. Abstract computational models, 
such as that presented here and by Hinton and Nowlan, inform biology only insomuch 
as they illustrate the possible space of dynamics and interactions. From the evolutionary 
computation point of view, the influence of symbiosis in guiding genetic variation is 
interesting only if it provides inspiration for more effective algorithms (here 
abstractions, rather than specific biological details, are more informative). 

For example: Does the combination of a fast variation mechanism with a relatively 
slow variation mechanism provide a method that is more powerful than either 
mechanism alone? Why might the encapsulation of a group into a single individual be 
computationally important? What is the algorithmic difference between symbiotic 
combination and sexual recombination? We address these three questions in turn below: 
the suggestions here have yet to be investigated. 

The coupling of fast and slow variation methods provides a balance between 
exploration and exploitation. A fast, non-permanent variation mechanism enables low- 
cost exploration (lookahead). Subsequent encapsulation via a method of slow variation, 
with high-commitment, enables stability from which further exploration may take place 
without disrupting solutions that have been proven. 

The encapsulation of the group into a single organism enables the opportunity for the 
process to recurse with a larger unit of variation. This implies that the process may be 
applicable to hierarchical building-block problems [13] where search progresses from 
bit-combinations to schema-combinations as the building -block hypothesis suggests [7]. 

The difference between symbiotic combination and sexual recombination, or 
crossover, as used in existing GAs, is two-fold. First, symbiotic combination occurs 
between distinct organisms, whereas sexual recombination occurs between similar 
organisms, i.e. from the same gene-pool and necessarily highly-converged. This 
perhaps suggests the use of recombination operators that mate similar organisms 
frequently (as used in existing niching methods [3]) but also mate dissimilar organisms 
on rare occasions. Second, symbiotic recombination is additive whereas crossover is 
‘either/or’ — that is, the results of symbiogenesis have the sum of the genes from the 
donors, but offspring from sexual recombination have approximately half the genes 
from each parent. This has important implications for respecting the integrity of the 
schemata represented by the parents. Such recombination is more akin to the Messy GA 
[4] or more generally the Incremental Commitment GA [14]. It can also be seen that the 
symbiotic filling-in of Fig. 1. corresponds to a ‘competitive template’ in the Messy GA. 



5 Conclusions 



We have seen two instances of an adaptive effect whereby non-genetic mechanisms 
guide the genetic make-up of organisms by shaping the evolutionary landscape. The 
Baldwin effect, modeled by Hinton and Nowlan, demonstrates learning as the first 
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example of such a mechanism. In this paper we have adapted Hinton and Nowlan’s 
model to illustrate a second example. Specifically, our experiments demonstrate how 
symbiotic scaffolding can guide the genetic make-up of organisms and lead to the 
evolution of organisms that would otherwise not occur. 
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Abstract. In this paper, we describe adaptive processes of populations 
with two distinct mechanisms of evolution, Darwinian and Lamarckian. 
We use a simple abstract model where neural networks capable of learn- 
ing are evolved through GAs. Each individual in the populations tries 
to maximize its life energy by learning certain rules that distinguish be- 
tween two groups of materials: food and poison. The best-performing 
individuals are selected to reproduce offspring according to their mech- 
anism of genetic inheritance, which is either Darwinian or Lamarckian, 
and the offspring conduct lifetime learning in the succeeding generation. 
In particular, we examine the adaptability of both populations toward a 
new unknown world, which is given after some evolutionary steps have 
taken place under the original world. As the main result, we show that 
only Darwinian populations can adapt to the new world. 



1 Introduction 

Through interactions with the environment, or learning, natural organisms may 
undergo some adaptive changes. However, these acquired characters will not be 
re-encoded in their genes and therefore will not be directly transmitted to their 
offspring. While Lamarckism, which has been one of the major ideas concerned 
with natural evolution, regards the effect of “inheritance of acquired characters” 
as the motive force of evolution, Darwinism claims that evolution is nothing 
but the cumulative processes of natural selection with random mutation, and 
denies the possibility of inheritance of acquired characters^. As we know, the 
mainstream of today’s evolutionary theory follows Darwinism, and Lamarckism 
is regarded incorrect or as a heresy. 

Due to this biological background, most of the discussions about learning and 
evolution from an artificial life research view are based on Darwinian models of 
evolution [6, 9, 1]. On the other hand, from an engineering view, it is not necessary 

^ Darwin himself accepted the possibility of inheritance of acquired characters, while 
Weismann, whose fundamental idea.s have had a great influence on Neo-Darwinism, 
completely denied the possibility. 
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to consider only Darwinian models. Indeed, the possibility of using the heredity 
of acquired characters would be quite attractive and some studies have shown 
that significant improvements in the performance of problem-solving systems 
can be attained by introducing Lamarckian mechanisms[5,2, 7], In these works, 
however, the effectiveness and superiority of Lamarckian evolution have been 
demonstrated only for static environments {or more generally for fixed tasks). 

In our previous papers[10, 11], we compared Darwinian and Lamarckian evo- 
lution under non-stationary environments, and showed that the Darwinian pop- 
ulation not only showed more stable behaviour in the face of environmental 
changes but could also maintain greater adaptability regarding such dynamic 
environments than could the Lamarckian population. While Lamarckian pop- 
ulation could adapt itself quickly to a certain single situation, it had difficulty 
in leaving the specific state of adaptation once it had taken place, owing to its 
extremely greedy strategy for genetic inheritance. In this paper, which offers an 
extended discussion using the same model as in [10], we observe the adaptability 
of both populations toward a new unknown world that is given after some evolu- 
tionary steps have taken place under the original world. The populations are first 
cultivated under a certain world (original world), either static or non-stationary, 
for a rather long period of generations. After that the populations are given a 
completely new world, and their adaptability toward this new world is evaluated 
in terms of learning ability. Note that in this paper, a non-stationary environ- 
ment, where environmental conditions change repeatedly, is itself regarded as 
one type of world. We discuss the relation between the adaptability to a new 
world and the level of innate errors of individuals’ neural networks. 

2 Experimental Model 

Here, we describe our experimental settings. A hundred individuals come into a 
virtual “world” , with 500 units of initial “life energy” for each. Each individual 
has a feed-forward neural network that serves as its “brain”, meaning that the 
individual takes action based on the network outputs (Figure 1). The neural 
network has three layers, each of which contains six input neurons, four hidden 
neurons, and eight output neurons. Each neuron is fully connected to all neurons 
of the next layer. We take an array of real numbers as a “chromosome” from 
which the neural network is developed. The chromosome directly encodes all the 
connective weights of the network[8]. Values of the chromosomes in the initial 
generation are set randomly, from the range —0.30 ~ 0.30. 

The world contains two groups of materials, “food” and “poison”, both of 
which have distinctive features as shown in Figure 3, for example. Each ma- 
terial is represented by an array of six bits as shown in Figure 3. Food and 
poison are distinguished by the upper three bits, and the lower three are set 
randomly for every new input as noise. Thus, each individual tries to maximize 
the chance of its own survival by learning the distinction rule, which in this case 
corresponds to a parity problem of three bits. On each occasion when given any 
material, an individual inputs the pattern of the material into its neural net- 
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Fig. 1. The architecture of an individual. 





Fig. 2. The genetic inheritance for Darwinian and Lamarckian evolution. 



work and stochastically determines whether to “eat” or “discard” it according 
to the network outputs. These actions, however, are not mapped directly from 
the outputs themselves. The network outputs are fed once as signals to an “Ac- 
tion Decision Module” (Figure 1), which then finally determines the action of 
the individutd stochastically, according to a Boltzmann distribution. This type 
of stochastic mechanism is necessary to maintain the possibility of seeking more 
advantageous behaviours, even if an individual has already acquired a certain 
adequate behavioural pattern. 

If what the individurd ate was food, it receives 10 units of energy and tries to 
train itself to produce the “eat” action with a higher probability for that pattern. 
Conversely, if the individual ate poison, it loses a comparable amount of energy 
and tries to train itself to produce the “discard” action with a higher probability 
for that pattern. When the individual discards the material, no learning is con- 
ducted. The aim of each individual is to maximize its life energy by learning a 
rule that discriminates food and poison. We use the Back Propagation Learning 
(BP Learning) algorithm, in combination with a simple Reinforcement Learning 
framework, to train each individual. The coeflScients of learning and inertia of 
BP Learning are set at r; = 0.75 and a = 0.8, respectively. 
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Fig. 3. A static environment. Whether the material is food or poison is determined by 
the parity property of upper three bits. The material is food if the number of black 
cells in the upper three bits is even, otherwise poison. The symbol means don’t 
care whether the cell is black or white; these cells have no effect on the distinction rule. 



Each individual is offered a certain number of materials one at a time and 
learning occurs. We regard this number of repeated events as the length of an 
individual’s “lifetime”, which is set at 400 in this paper. At the end of each 
generation, some of the individuals are selected tis parents by a stochastic cri- 
terion proportional to the level of their energy, which is thus regarded as their 
fitness. Selected parents then reproduce new offspring according to their genetic 
strategies, which is either Darwinian or Lamarckian. The connective weights of 
the individuals’ neural networks must have been modified through their lifetime 
learning. The result of the modification through the learning is what we here 
call “acquired characters”. Darwinian individuals do not transmit the acquired 
characters to the next generation, instead they simply pass their chromosomes, 
which they inherited from their parents, to the process of GAs (Figure 2(a)). On 
the other hand. Lamarckian individuals re-encode the acquired characters into 
their chromosomes, and pass them to the process of GAs (Figure 2(b)). Chromo- 
somes of the selected individuals undergo the genetic processes of recombination 
and mutation. Here, the number of crossing-over points is set randomly from the 
range 0 ~ 4. Each mutation at each locus of chromosome occurs at the rate of 
5%, with a variation range between ±0.5. Thus, the selected parents reproduce 
new offspring, which then undergo lifetime learning in the following generation. 

Although the parameters in this paper are set heuristically according to some 
preliminary experiments, we have confirmed that changing these values within a 
moderate range results in qualitatively similar outcomes. All the results shown 
in the following are averages of ten trials. 

3 Experimental Evaluations 

3.1 Experiment 1: Static environment 

Firstly, we consider a static w'orld where the distinction rule between food and 
poison does not change, and observe the adaptive processes of the Darwinian 
and the Lamarckian populations. 

Figure 4 shows the changes in the average fitness of the populations through 
generations. As shown by the figure, we see that the Lamarckian population can 
adapt themselves toward this static world more quickly and effectively than can 





43 



the Darwinian population. The result meets our expectations quite well, since 
the Lamarckian population can continue the learning process that their parents 
suspended part-way in the previous generation, while the Darwinian population 
are forced, to some extent, to make a fresh start in each generation. 



Average Fitness 




Generations 



Average Innate Errors 




Fig. 4. Average fitness Fig. 5. Average innate errors 

(Static environment) (Static environment) 



Figure 5 shows the changes in the output errors of innate neural network 
through generations. Network output error is measured by the mean squared er- 
ror, over every pattern, between the actual outputs and the ideal outputs. Innate 
output errors are the errors that individuals show before any learning. There- 
fore, curves of innate errors declining through generations means that individuals 
came to behave more appropriately from their birth in the later generations. As 
shown in Figure 5, the Lamarckian individuals decrease their innate errors far 
more quickly than do the Darwinian individuals. 

We must note here (in Figure 5) that the innate errors do decrease even for 
the Darwinian population, where acquired characters cannot be transmitted from 
parents to children through genetic processes. This case can be explained from 
the point of view of the “Baldwin EffecflS], As Hinton and Nowlan showed in 
their historical work[6], even if evolution is Darwinian, characters first acquired 
through learning can then be assimilated in the genotype as the generations 
proceed. One more thing we must note here is that the innate errors for the 
Darwinian population do not decrease monotonously from the initial generation, 
but there is a short period where their innate errors increase and thus the curve 
has a bump. We will discuss this point in Section 4. 



3.2 Experiment 2: Non-stationary Environment 

Next, we consider a non-stationary world where each material is the same as 
Experiment 1. Now, however, the distinction rules suffer irregular changes as 
shown in Figure 6. The following shows the results for a situation where the 
distinction rule is changed randomly at intervals of 20 generations. 

Figures 7(a) and 7(b) show the changes in the average fitness of the Dar- 
winian and the Lamarckian populations, respectively. As one can intuitively 
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Fig. 9. Setup for Experiment 3. Picking up populations that have been cultivated for 
some generations under either (l)the static world or (2)the non-stationary world, then 
put them into a new world. 



imagine, the fitness of the Lamarckian population oscillates violently whenever 
environmental conditions change and thus they can not adapt themselves to 
this non-stationary world. On the other hand, the point we should especially 
emphasize is that the fitness of the Darwinian population rises as the gener- 
ations proceed, although oscillation is observed. We have confirmed in further 
experiments that the Darwinian genetic mechanism forms, through evolution, a 
population of individuals that can cope appropriately wdth any of the rules given 
in Figure 6(See [10] for more details). 

Figures 8(a) and 8(b) show the changes in the average innate errors of the 
Darwinian and the Lamarckian populations, respectively. As we can see, innate 
errors of the Darwinian population show smaller oscillations than those of the 
Lamarckian population. It is interesting here to note that innate errors of the 
Darwinian population in the earlier generation firstly rise a little and are sub- 
sequently maintained at a rather higher level, unlike the results of the previous 
experiment where the errors decrease via genetic assimilation in the later gener- 
ation(Figure 5). That is to say, individuals become worse from the point of view 
of their innate behaviour. However, their learning ability becomes better and 
thus their fitness through their whole life increases as the generations proceed. 



3.3 Experiment 3; Adaptability toward New World 

Now, we observe the adaptability of the Darwinian and the Lamarckian popula- 
tions toward a new, unknown world, which is given after some evolutionary steps 
have taken place under the original world. Firstly, we pick up populations that 
have been cultivated for some generations under the original world; either the 
static world (Experiment 1) or the non-stationary world (Experiment 2). Then 
we put them into a completely new world, and evaluate their adaptability toward 
this new world in terms of their learning ability (Figure 9). In the new world, 
materials are distinguished by the upper four bits, and the lower two bits are 
noise. Individuals cannot solve the parity problem of four bits with the three-bits 
rule they have acquired under the previous world, so they have to learn the new 
rule for their survival. 

Figures 10(a) and 10(b) show the learning curves under the new world for 
both populations, which were originally cultivated under the static world (Figure 
9(1)). Figure 10 shows the changes in the network output errors through their 
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learning steps. If individuals have adaptability toward the new world and can 
learn the new rule, their learning curve declines as the learning steps proceed. 
As expected, Lamarckian individuals cannot learn the new rule at all, since 
they have deeply adapted for the specific environment under which they were 
cultivated. Darwinian individuals that have been cultivated for long generations 
under the original world cannot learn the new rule either. However, those that 
have been cultivated for rather short periods, say up to 1000 generations, increase 
their adaptability even toward the new world (Figure 10(a)). The population that 
has been cultivated for about 600 generations shows the best adaptability. 

Figures 11(a) and 11(b) show the learning curves under the new world for 
both populations, which were originally cultivated under the non-stationary 
world (Figure 9(2)). The Lamarckian individuals of any generations cannot learn 
the new rule at all. On the other hand, Darwinian populations that have been 
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cultivated for some generations, say about 600 generations, can learn the new 
rule to some extent. Note here, that in this case, unlike the case of populations 
were originally cultivated under static world, the Darwinian individuals main- 
tain the adaptability toward the new world even after they have been cultivated 
in the original world for a higher number of generations. 

4 Discussion 

From the results for the Darwinian population, it seems that evolution under 
the static world falls into two stages. Firstly, in the earlier stage of evolution, 
individuals that have better learning ability for the given rule are selected, and 
thus selection pressure develops the “ability to learn”. The increase of the innate 
errors in the earlier stage of evolution, shown in Figure 5, means that the “ability 
to learn” is more important than the “ability to perform (innately)” at this 
moment, and hence selected for evolution. We state here that the “ability to 
learn” may not be specific for the given rule but rather a general one, since the 
population increases adaptability even toward the unknown new world (Figure 
10(a)). Secondly, once the whole population is occupied by individuals that have 
high learning ability, selection pressure comes to prefer individuals that show 
innately better behaviours and thus develops the “ability to perform (innately)”, 
also this is shown from decrerise of the innate output errors(Figure 5). Therefore, 
the population comes to adapt itself deeply to the given world, and comes to lose 
adaptability toward the new world as the generations proceed (Figure 10(a)). 
Although the generation of the most adaptable population toward the new world 
shown in Figure 10(a) does not completely coincide with the generations around 
the bump of the innate output errors shown in Figure 5, there may be some 
relations between the non-monotonicity in both cases. 

On the other hand, under the non-stationary world where environmental 
conditions suffer changes, individuals cannot devote themselves to a certain spe- 
cific condition, but they should maintain the “ability to learn”. As the “ability 
to learn” is a general ability, they also maintain adaptability toward the un- 
known new world, and the second phase, which occurs under the static world, 
i.e., selection for the “ability to perform (innately)” does not occur in this case 
(Figure 11(a)). The innate output errors are maintained at a certain higher level 
as shown in Figure 8. The above fact also suggests that there may be some rela- 
tions between adaptability toward the new world and the level of innate output 
errors. 

5 Conclusion 

We evaluated how learning with inheritance of acquired characters affects the 
evolution of the population in a simple abstract model where neural networks ca- 
pable of learning are evolved through GAs. The results obtained were as follows: 
Firstly, while Lamarckian individuals could adapt themselves toward a static 
world quite quickly and effectively, they had difficulty in leaving the specific 
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state of adaptation once it had been reached and therefore could not cope with 
another new world at all. Secondly, Darwinian individuals that had been culti- 
vated under a static world, surprisingly increased their adaptability even toward 
another new world if the new world was given in a relatively early generation, but 
their adaptability declined as the change occurred in later generations. Lastly, 
Darwinian individuals that had been cultivated under a non-stationary world 
also increased their adaptability toward another new world, and could maintain 
the adaptability even after many generations in the earlier world. 
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Abstract. We have proposed a theoretical hypothesis called the Programmed 
SelfDecomposition (PSD) model [1, 2]. The PSD model assumes that a sdf- 
decompositicm mechanism isprogrammed in each cell of all living oiganisms on 
eartii, and that thismechanism contributes to the substantial and spatial restoration 
of the ecosystem. In this article, we would like to introduce the overview of the 
hypothesis and report on computer simulations in whidi both a mortal virtual life 
form based on the PSD model and an immortal virtual life fam start their lives in 
the same finite, heterogeneous ecosystem. Results of the simulations suggest that 
the mortal life forms would not be exterminated in almost all cases, contrary to 
ordinary expectations. It seems that the effectiveness of "death" cannot be denied. 



1 “Programmed Self-Decomposition” Model 

1.1 Restoration Mechanism of the Terrestrial Ecosystem 

We would like to investigate the effectiven«s of “death.” We supposed that death 

contributes restoration of ecosystem, advancement of evolution and that death itself 

was obtained as a fruit of evolution. 

The terrestrial ecosystem can still be characterized as being almost closed, in that 
space and substance are both limited. Accordingly, to maintain the stability of that 
terrestrial life form activity, the space andsubstanceremovedfromtheenvironmentby 
the activity of the life form itself have to be returned to the environment. 

The mechanism of restoring the terrestrial ecosystem to its original state has 
conventionally been explained by the principle of biological circulation called the food 
chain - mutually giving oneself as food to other living organisms and depending upon 
metabolic activity to attain restoration of the environment as a whole. 

We have set up a new hypothesis that is complementary to that of the food chain. In 
the present terrestrial ecosystem, while such a mechanism of restoring the environment 
to its original state is occurring, another hidden mechanism is fundamentally built into 
every living organism by which each cell ofifaat organism positively decomposes itself 
so as to contribute to the restoration of the environment. 

The phenomenon of decomposing oneselfby one's own force is observed in life on 
earth. Called “autolysis,” it has so far been recognized as a process of destruction, i.e., 
of living organisms randomly directed from a state of order to one of disorder. 
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Our hypothesis, on the other hand, looks at such a phenomenon as a controlled 
biochemical process for the given purpose of restoration of the environment to 
guarantee effective utilization of restored nutrients and space. To distinguish this 
process from autolysis, we call it “self-decomposition.” 

1.2 Theoretical Definition of Self-Decomposition 

We have developed a new concept of a self-reproductive, self-deconposable 
automaton [1] through modification of von Neumann's self-rq)roductive automaton 
model [3]. It can be summarized as follows: (1) Automaton A constmets another 
automaton according to instruction I; (2) Automaton B makes a copy of inslrurtion I; 
(3) Mechanism C combines A and B and fiinctbns as follows:(3-l) Id A construct 
another automaton according to I / (3-2) let B make a copy of I and insert it into the 
automaton constructed above / (3-3) separate the new automaton from the system A + 
B + C; (4) Automaton D consists of A + B + C; (5) Instruction Id describes automaton 
D; (6) Automaton E consists of D + Id, which can reproduce itself; (7) Instruction Id+f 
describes automaton D plus another given automaton F; (8) Automaton Ef consists of D 
+ Id +F, which can reproduce itselfand construct another automaton F. 

This model continues to self- reproduce as long as there is sufficient space, material 
and energy, and its structure remains intact unless it is attacked by external forces. It is 
immortal. On the contrary, living organisms on earth die without exception, and unless 
special measures are taken, they degrade into components after death. This is the 
essential difference between von Neumann's model and terrestrial life. So taking into 
account the ideas presented here, we have developed the following model, using von 
Neumann's model as a prototype, into which the process of “death and decomposition” 
is already programmed. This new model can be expressed as a variation of von 
Neumann’s self-reproductive automaton Ef. To be more specific: 

(a) Automaton FZ, which has the ability to disjoint the whole system into component 
elements, is a modular subsystem comparable to von Neumann's automaton F. 

(b) Instruction Id+fz describes automaton D plus automaton FZ. 

(c) Automaton Efz is a system comparable to von Neumann's automaton Ef whose 
instruction Id+f is replaced by instruction Id + fz- 

(d) Automaton G is a system composed of Efz and FZ, viz., D + FZ + Id + fz . 

This system G can reproduce itself, and therefore makes FZ a subsystem within the 
system. FZ has the ability to disjoint G into finite elements. These elements are sized 
and stractured in such a way that the entire ecosystem that G belongs to may take 
advantage of them collectively. FZ’s mode of action can be oneofthefollowingthree: 

(1) Its productfon is normally restricted. With the input of a partteular message, 
however, the restriction is lifted and the production activated. 

(2) Its operation is normally restricted. With the input of a particular message, 
however, the restriction is lifted and the operation activated. 

(3) ( 1 ) and (2) together. 

Furthermore, if, after a certain amount of time has passed or a certain set of events 
has occurred, there is still no message input to trigger an action, (1), (2), or (3) will 
happen automatically. The “certain message” referred to above is provided, evidently, 
when it becomes impossible for G to further multiply itself or to maintain its structure, 
that is, when it is on the verge of extinction. If G is not given such a message for a 
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certain period of time, G automatically puts FZ into operation. Therefore, we regard G 
as an automaton having a programme of spontaneous and inevitable “death and 
decomposition” installed a priori. 

System G composed as described above not only reproduces itself, but also has the 
ability to put an end to its own life and to return to its origins, that is, to contribute to 
the restoration of the ecosystem to its original state. We tentatively call this system a 
“self-reproductio~self-decomposition (SRSD) system” and call the theoretical model 
under discussion a “programmed self-decomposition (PSD) model.” 

2 A Study on Virtual Life - “SIVA-3” 

2.1 Motivation 

We adopted two approaches in order to test this model: One, biological experiments 
using actual life forms; the other, computer simulations using virtual life forms. 

We developed the simulator series “SIVA (Simulator for Individuals of Virtual 
Automata)” in order to investigate effectiveness of “death” [2]. Recently, another 
investigation using CA is being conducted by Sayama [4] based on a similar concqit. 
We simulated activities of artificial lives in a finite and heterogeneous ecosystem 
similar to the actual terrestrial ecosystem by using SIVA-3. We found thata mortal life 
based on the PSD model has a fundamental advantage over an immortal life when they 
live independently in two similarly conditioned environments. The mortal life can 
contribute to the restoration of the simulated ecosystem and continuously reuse limited 
space and substances. It also has greater opportunity for evolutionary adaptation [2]. 
However, since an immortal life occupies space and uses substances eternally, another 
life, including its own species, may have difficulty proliferation in the same 
ecosystem. A situation whereby immortal life forms exterminate mortal life forms can 
be anticipated if the two types of life forms live in the same ecosystem. To test this, 
simulations were prepared in which an immortal life form and a mortal life form start 
their lives in the same finite and heterogeneous ecosystem. SIVA-3 was arranged as 
described below. 

2.2 Design of the Simulations 

SrVA-3 was designed as previously described [2]. The virtual space in SIVA-3 is 
assumed to be a 2-dimensional lattice of 128x128 pixels. One pixel is defined as a 
spatial unit that one individual of virtual life must occupy to exist. The virtual 
substances, ofwhich the ecosystem and the artificial lives in SlVA-3 are composed, are 
restricted to four types of elements. Environmental heterogeneity is introduced into 
256 of spatial blocks (each consisting of 8x8 pixels), on the basis of the quantities of 
substances. These conditions range fi'om those with the maximum conformity for 
primitive life, to those with the minimum conformity, that is, under a condition in 
which it is impossible for primitive life to survive or reproduce itself unless it adapts to 
the condition of taking many evolutionary steps. SIVA-3 can also make heterogeneity 
based on temperature, this was not adopted in the current simulations in order to make 
the simulative conditions simpler. Sufficient energy for life activities is assumed to be 
given [2], 
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Bofli an immortal non-PSD life form based on von Neumann's self-reproductive 
automaton model and a mortal PSD life form with an added modular subsystem, which 
has the ability to disjoint the whole system into component elements, on the immortal 
life form were designed. They consist of VC (Instruction I including Ip and Ip + fz) and 
VP (Automata including A, B, FZand Mechanism C) [2]. An internal status variable 
UNCONF (non-negative integer), was newly installed for the current simulations as a 
measure of conformity with the environment. The default value of UNCONF is 0 when 
thta-e is complete conformity with the environment. When a command to reproduce a 
life form fails in execution, the individual perceives unconformity with the 
environment and increases the value of its internal status variable, UNCONF, by 1 per 
one failure. 

Following two global variables were introduced in order to investigate the 
requirements for exterminating mortal life forms by immortal life forms. 

Temporary adaptability. TEMP -ADAPT (between 0.0 - 1.0) was designed as the 
degree of temporary adaptability of rearranging current activities and getting through 
the situation without evolutionary adaptation. The probability that each of the 
commands for the self-reproductive activities will execute was calculated as TEMP- 
ADAPT raised to the power of UNCONF in this investigation. 

Mutation Rate. At a probability of misvcrate (between 0.0 - 1 .0) per character of VC, 
substitution will occur when the VC is copied. Masking against mutation was applied 
to the VCs of those individuals corresponding to automaton A, B, C, or FZ, so as to 
prevent the individuals from losing the function of that portion. 

2.3 Simulations 

Methods. Both an individual immortal life form and a mortal life form based on the 
PSD model were seeded in the center ofthe same ecosystem in SIVA-3. There were 36 
simulations conducted in all, including 6 cases of temporary adaptability (TEMP- 
ADAPT: 0.0, 0.2, 0.4, 0.6, 0.8, and 1.0); and 6 cases of mutation nie {misvcrate: 
1x10-6, 1x10-5, lxlO-4, 1x10-3, 1x10-2, and 1x10-1). Each simulation was carried 
out 10 times from TC (Time Count) = 0 to TC = 1000. The extermination ratio of 
mortal life forms was calculated for each of the 36 cases. Accumulated frequency of 
mutation was calculated for each of the 36 cases. 

Results. Inevitable extermination of the mortal life forms occurred in only one case 
(TEMP- ADAPT: 1.0; misvcrate: 1x10"^). There were the other 3 ca.ses, in which 
extermination sometimes occurred. The mortal life forms never be exterminated in the 
other all 32 cases (Fig. 1(A)). 

The immortal life forms succeeded in occupation of the relatively broad region only 
when TEMP-ADAPT had extremely high value (1.0). The mortal life forns, on the 
other hand, can increase on the large scale and extend to the broader regions when 
TEMP - ADAPT was not extremely high. Boto of them show the tendency that the 
higher the mutation rate, the broader the habitat distribution (Fig. 1(B)). 

Accumulated frequencies of mutation of the mortal life forms were higher than those 
of the immortal life forms. In the case of TEMP - ADAPT: 0.4 and misvcrate: 1x10'^ for 
example, the sum of the mutation frequency of the mortal life forms between TC=0 - 
1 000 exceeded 60000, however that of the immortal life forms was about 1 00. 
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Fig. 1 (A) Extermination ratio of the mortal life forms by the immortal life forms. The 

number of simulations in which all individual mortal life forms had been exterminated was 
counted for each of the 36 cases. (B) Typical distributions of virtual life forms after 1 000 TC 
simulations. This matrix shows the results ofthe typical 1 5 simulations fortheaboveconditions. 
The black mass denotes the immortal life forms and the white dots denote the mortal life forms. 

Discussion. These simulation results suggest that extremely specific conditions 
(i.e., the highest temporary adaptability and relatively low mutation rate) arerequiredin 
order for immortal life forms to exterminate mortal life forms. This means that the 
probability of extermination of the mortal life forms by the immortal life forms was 0 in 
the .practical situations, except the situation in which temporary adaptation to any kind 
of environment can be perfectly achieved 

The PSD mechanism contributed the restoration of the ecosystem and advanced 
evolution in the current simulations, as previously .discussed [2]. Consequently, the 
species of the mortal life forms became abundant and they occupied the broader area of 
the heterogeneous ecosystem than the immortal life forms did. The complexity of the 
ecosystem was also realized. It is noteworthy that these findings were obtained in the 
simulations in which the mortal life forms and the immortal life forms co-existed in the 
same finite, heterogeneous ecosystem. The results we found in this time suggest that it 
is difficult to deny the effectiveness of death in a finite, heterogeneous ecosystem. 
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Abstract. The error threshold — a notion from molecular evolution — 
is the critical mutation rate beyond which structures obtained by the 
evolutionary process are destroyed more frequently than selection can 
reproduce them. We argue that this notion is closely related to the more 
familiar notion of optimal mutation rates in Evolutionary Algorithms 
(EAs). This correspondence has been intuitively perceived before ([9], 
[11]). However, no previous study, to our knowledge, has been aimed at 
explicitly testing the hypothesis of such a relationship. Here we propose 
a methodology for doing so. Results on a restricted range of fitness land- 
scapes suggest that these two notions are indeed correlated. There is not, 
however, a critically precise optimal mutation rate but rather a range 
of values producing similar near-optimal performance. When recombi- 
nation is used, both error thresholds and optimal mutation ranges are 
lower than in the asexual case. This knowledge may have both theoret- 
ical relevance in understanding EA behavior, and practical implications 
for setting optimal values of evolutionary parameters. 



1 Introduction 

The error threshold — a notion from molecular evolution — is the critical mu- 
tation rate beyond which structures obtained by the evolutionary process are 
destroyed more frequently than selection can reproduce them. With mutation 
rates above this critical value, an optimal solution would not be stable in the 
population, i.e., the probability that the population loses these structures is no 
longer negligible. On the other hand, an optimum mutation rate — a more fa- 
miliar notion within the EAs community — is the mutation value which solves 
a specified search or optimization problem with optimal efficiency, that is with 
the least number of generations or function evaluations. 

The notion of error threshold seems to be intuitively related to the idea of an 
optimal balance between exploitation and exploration in genetic search. In this 
sense, we argue that optimal mutation rates are related to error thresholds. The 
aim of this paper is to test this hypothesis using an empirical approach together 
with knowledge from molecular evolution theory. 
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Optimal parameter settings have been the subject of numerous studies within 
the EA community [2], [6], [17], and particular emphasis has been placed on 
finding optimal mutation rates [9|. There is, however, no conclusive agreement 
on what is best, most people use what has worked well in previously reported 
cases. It is very difficult to formulate a priori general principles about parameter 
settings, in view of the variety of problem types, encodings, and performance cri- 
teria possible in different applications. Our hypothesis — that optimal mutation 
rates are correlated to the notion of error thresholds — promises practical rele- 
vance and useful guidelines in finding optimal parameter settings, thus enhancing 
evolutionary search. 

In the remainder of the paper we summarize the knowledge from molecular 
evolution relevant to our argument: the notions of quasispecies and error thresh- 
olds; we discuss the relation between error thresholds and optimal mutation 
rates; and we describe the fitness landscape used for our experiments: the Royal 
Staircase functions. Thereafter, we describe the empirical methodology used to 
test the hypothesis under study, we present the experimental results obtained, 
and we discuss the insight gained. 



2 Quasispecies and Error Thresholds 

The concept of a ‘quasi-species’ was developed in the context of polynucleotide 
replication, and in particular studies of early RNA evolution [3], [4], [5]. A pro- 
tein space, [12] or more generally a sequence space, can be modelled as the space 
of all possible sequences of length v drawn from a finite alphabet of size A. 
Each sequence has a fitness value which specifies its replication rate, or expected 
number of offspring per unit time. The fitnesses of all A'' possible sequences 
define a ‘fitness landscape’. When A = 2, a, binary alphabet, the fitness land- 
scape is equivalent to specifying fitness values at each vertex of a (/-dimensional 
hypercube; with some mathematical imagination — and some caution — this 
can be pictured as spread out over a geographical landscape where fitness is 
analogous to height, and the dynamics of evolution of a population corresponds 
to movement of the population over such a landscape. 

Given an infinite population, and a specified mutation rate governing errors 
in (asexual) replication, one can determine the stationary sequence distribution 
reached after any transients from some original distribution have died away 
[4]. Unless the mutation rate is too large or differences in fitnesses too small, 
the population will typically cluster around the fittest sequence(s), forming a 
concentrated cloud; the average Hamming distance between two members of 
such a distribution drawn at random will be relatively small. Such a clustered 
distribution is called a ‘quasi-species’. As the mutation rate is increased, the 
local distribution widens and ultimately loses its hold on the local optimum. 

This can be seen at its clearest in an extreme form of a fitness landscape 
which contains a single peak of fitness a > 1, all other sequences having a fitness 
of 1. With an infinite population there is a phase transition at a particular error 
rate p, the mutation rate at each of the v loci in a sequence. In [5], this critical 
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error rate (the error threshold) is determined analytically (Equation 1), and it 
is defined as the rate above which the proportion of the infinite population on 
the peak drops to chance levels. 

( 1 ) 



In equation 1, a represents the selective advantage of the master sequence 
over the rest of the population, and v the chromosome length. In the simplest 
case a is the ratio of the master sequence reproduction rate (fitness) to the 
average reproduction rate of the rest. 



2.1 Error Thresholds In Finite Populations 

In [14] the calculations of an error threshold for infinite asexually replicating 
populations are extended to finite populations (we shall call the critical rate pm 
for a population of size M). Finite populations lose grip on the solitary spike 
of superior fitness easily, because of the added hazard of natural fluctuations 
in this case. In [15], we derived a reformulation of the Nowak and Schuster 
analytical expression. This new expression (equation 2) explicitly approximates 
the extent of the reduction in the error threshold as we move from infinite to 
finite populations. The expression strictly should be an infinite series in which 
successive terms get smaller; here, we are ignoring all after the first few: 



PM 



ln{a) 2\/<t — 1 2ln{a)\Ja — 1 
v^/M 1/2 \/M 



( 2 ) 



3 Error Thresholds and Optimal Mutation Rates 

The notion of error threshold seems to be intuitively related to the idea of an 
optimal balance between exploitation and exploration in genetic search. Too low 
a mutation rate implies too little exploration; in the limit of zero mutation, suc- 
cessive generations of selection remove all variety from the population, and once 
the population has converged to a single point in genotype space all further ex- 
ploration ceases. On the other hand, clearly, mutation rates can be too excessive; 
in the limit where mutation places a randomly chosen allele at every locus on an 
offspring genotype, then the evolutionary process has degenerated into random 
search with no exploitation of the information acquired in preceding generations. 

Any optimal mutation rate must lie between these two extremes, but its 
precise position will depend on several factors including, in particular, structure 
of the fitness landscape. It can, however, be hypothesized that where evolution 
proceeds through a successive accumulation of information then a mutation rate 
close to the error threshold is an optimal mutation rate for the landscape under 
study; since this should maximise the search done through mutation subject to 
the constraint of not losing information already gained. The main purpose of 
our paper is to empirically test this hypothesis (section 5). 
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Some biological evidence supports the relationship between error thresholds 
and optimal mutation rates. Eigen and Schuster [5j have pointed out that viruses 
— which are very efficiently evolving entities ~ live wdthin and close to the error 
thresholds given by the known rates of nucleotide mutations. This correspon- 
dence has also been noticed before in the GA community: Hesser and Manner 
[9], devised a heuristic formula for optimal setting of mutation rates inspired by 
previous work on error thresholds [14]; Kauffman [11] (p. 107) also suggest a 
relationship between these two notions. 

4 Royal Staircase Fitness Functions 

van Nimrvegen and Crutchfield [19] proposed the Royal Staircase functions for 
analyzing epochal evolutionary search. This class of functions are related to the 
previous Royal Road functions [13[. In [19] the authors justify their particular 
choice of fitness function both in terms of biological motivations and in terms of 
artificial evolution issues. In short, many biological systems and artificial evolu- 
tion problems have highly degenerate genotype-to-phenotype maps; that is, the 
mapping from genetic specification to fitness is a many-to-one function. Conse- 
quently, the number of different fitness values that genotypes can take is much 
smaller than the number of different genotypes. Moreover, due to its high dimen- 
sionality, it is possible for the genotype to break into networks of “connected” 
sets of equal-fitness genotype that can reach each other via elementary genetic 
variation steps such as point mutation. These connected subsets of iso-fitness 
genotypes are referred to as “neutral networks” [10]. 

Our paper is guided by the working hypothesis that many real search prob- 
lems have genotype search spaces that decompose into a number of such neutral 
networks. Such neutrality has been observed in problem domains as diverse as 
molecular folding [18], evolvable hardware [8], and evolutionary robotics [7]. One 
symptom of evolutionary search where neutral networks are important is that 
of long periods of (sometimes noisy) fitness stasis ( — search along a neutral 
network) punctuated by occasional fitness leaps ( — transitions to a higher neu- 
tral network). The Royal Staircase class of fitness functions capture the essential 
elements discussed above, and are suitable for evaluating our hypothesis. They 
arc defined as follows [19]: 

1. Genotypes are specified by binary strings s = siS 2 ■ ■ ■ sc,Si £ {0,1}, of 
length L = NK. 

2. Starting from the first position, the number I{s) of consecutive Is in a string 
is counted. 

3. The fitness f{s) of string s with I{s) consecutive ones, followed by a zero, 
is f{s) = 1 + [7(s)/Arj. The fitness is thus an integer between 1 and iV -h 1, 
corresponding to 1 plus the number of consecutive fully-set blocks starting 
from the left. 

4. The single global optimum is s = 1^; namely, the string of all Is. 

Fixing N (number of blocks) and K (bits per block) determines a particular 
problem or fitness landscape. 
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5 Experimental Design 

The approach taken here is to independently assess error thresholds and optimal 
mutation rates, comparing then these two measures. 

For the experiments, we used Royal Staircase functions (section 4). The ratio- 
nale for this choice is two-fold. First, because we agree with their proposers [19] 
that these functions, despite their simplicity, have some ingredients encountered 
in evolutionary search problems. Secondly, because Staircase functions have a 
step feature similar to that of single peak landscapes. Theoretical results on er- 
ror thresholds are available for single peak landscapes. Error thresholds can be 
extended to other landscapes, however, a degree of ruggedness is needed. 

We used a generational GA with fitness proportional selection and without 
elitism. Fitness functions were Staircase functions. Specifically, we tested 6 dif- 
ferent functions (choices of N and K): N = 1-3, K = 10; and N = 4-6, AT = 5. 
Population size was 100, genetic operators were standard bit mutation and two- 
point crossover with a rate of 0.6. Several mutation rates were tested, from 0.0 
to 0.2 expected mutations per bit. The algorithm was run in two modes Asexual. 
using mutation only; and Sexual: using both mutation and recombination. Each 
run lasted a maximum of 5000 generations. 

5.1 Empirically Determined Optimal Mutation Rates 

For the purpose of this paper, we defined the optimal mutation rate as that 
which finds the peak (on average) with the least number of generations. 

For determining optimal mutation rates as defined above, we ran the GA 
starting from a random population, and stored the generation number at which 
the peak was attained for the first time. This measure was averaged over 100 
trials for each mutation rate tested. 



5.2 Empirically Determined Error Thresholds 

The error threshold is the critical mutation rate beyond which structures ob- 
tained by the evolutionary process are destroyed more frequently than selection 
can reproduce them. Aiming at capturing this definition in an algorithm, we 
designed the following method for empirically estimating error thresholds: 

For the selected range of mutation rates; 

— Start from a population of all Is, that is, all members on the peak. 

— Run the GA for a maximum of 5,000 generations or until the whole popula- 
tion has completely lost the peak. 

— Count how many times, out of 100 trials, the population completely loses 
the peak. 

For low mutation rates, at least one member of the population is on the 
peak during all the generations, for all the 100 trials. As the mutation rate is 
increased, there is a point where the population completely loses the peak for 
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some or all the 100 trials. The error threshold is identified as the mutation rate 
where this transition occurs. The observations are approximate in that, firstly, 
precision is limited to the mutation step size used, and secondly, the limit of 
5,000 generations was assumed to be sufficiently long for the purpose. 



Validating the Empirical Method For validating the empirical method de- 
scribed above, we designed the following experiment; we considered a single peak 
landscape with cr = 2, and for distinct string lengths (10, 25, 50, 100, and 200), 
we calculated the error threshold analytically using equation 2 (valid for finite 
asexual populations). The population size, Af, was 100. Results of these calcula- 
tions are shown in the second column of table 1. Next, for the the same landscape 
and settings, we estimated the error thresholds empirically following our method. 
Results of these estimations are shown in the third column of the table. There is 
reasonable agreement between analytical and empirical figures, though worst at 
short string lengths. The empirical figures were always higher than the analytical 
ones, which is related to the limited number of generations used. The higher the 
number of generations the lower one should expect the empirical figure to be; 
the analytical figure assumes an infinite number of generations. 



Table 1. Comparing Analytical and Empirical Error Thresholds on a Single Peak 
Landscape 



String Length Analytical Empirical 



10 


0.05 


0.11 


25 


0.02 


0,03 


50 


0.01 


0.015 


100 


0.005 


0.006 


200 


0.003 


0.004 



6 Results 

Figure 1 summarizes experimental results for the six landscapes studied. The 
curves show' the number of generations to reach the global peak as a function 
of the mutation rate, for asexual and sexual populations. Each data point gives 
the number of generations for finding the peak averaged over 100 runs. Optimal 
mutation rates are those which find the peak with the least number of gen- 
erations. Two general trends may be observed, if one excludes the first result 
sliowm. First, there is not a single critically precise optimal mutation rate, but 
instead a range of mutation values producing near-optimal results. The curves 
are U-shaped with a flat bottom. Secondly, the curves for sexual populations 
are shifted to the left, that is, to lower mutation values, when compared to the 
asexual population curves. 

In the plots we indicate the empirically estimated error thresholds for asexual 
(solid arrows) and sexual (dotted arrows) populations. Error thresholds for the 
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landscapes studied were found to be within the range of optimal mutation rates, 
for both asexual and sexual populations. The results support the hypothesized 
relationship between these two measures. ,j 

An exception to these general trends is the plot for A’ = 1, AT = 10. This 
scenario is equivalent to a single peak landscape, where a single string has the 
highest fitness, and all the others strings have the same but lower fitness. The 
rationale for hypothesising that optimal mutation rates are correlated with op- 
timal mutation rates (Section 3) was that of maximising search subject to the 
constraint of not losing information already gained. However, this landscape is 
an extreme case where there is no intermediate step, no accumulation of in- 
formation which might be lost through excessive mutation. The peak is found 
randomly without any gradual approach. Here, high mutation rates (close to 
1.0) were optimal for both asexual and sexual populations. 

Notice that error thresholds for sexual populations were, in all cases, lower 
than for asexual populations. A similar trend is observed for optimal mutation 
rates; this is compatible with the hypothesized correlation between error thresh- 
olds and optimal mutation rates. 

The experiments determining optimal mutation rates showed standard devi- 
ations of the same order as the average number of generations measured. Thus, 
there were large run-to-run variations in the time to reach the optimal string. 
This observation was also reported by van Nimwegen and Crutchfield who per- 
formed similar experiments using the Royal Staircase function [19]. Figure 2 
shows standard deviations for the A = 2, AT = 10 landscape. 

7 Discussion 

Our results suggest that error thresholds and optimal mutation rates are indeed 
correlated. This empirical evidence supports previous intuitions expecting this 
correlation [9] [11], [15], [16]. There is not, however, a single critically precise 
value for the optimal mutation rate, but instead a range of values producing 
near-optimal performance. The error threshold, on the other hand, is a more 
precise measure. Hence, mutation rates slightly lower or higher than estimated 
error thresholds are likely to produce near-optimal results. 

The implication of this finding is two-fold. First, theoretically, in helping to 
understand EAs’ behavior, as insights regarding error thresholds will be reflected 
in our understanding of optimal mutation rates. Second, practically, as heuristics 
for finding error thresholds will provide useful guidelines for setting optimal 
mutation rates, thus improving the performance of EAs. 

In our experiments, both error thresholds and optimal mutation ranges were 
lower for sexual compared to asexual populations. This result has been observed 
before: a recent work from the evolutionary biology literature [1] studied the 
role of recombination on evolving population of viruses, particularly, the effect 
of recombination on the magnitude of the error threshold. They report that 
recombination shifts the error threshold to lower mutation rates. Moreover, in 
[16], now in the realm of genetic algorithms, we found that recombination shifts 




8 S S 8 $ 



§ 8 i 1 5 1 § i § m ^ § i I 

ddobdecidcideddddo 



§ § § I § § I i I B M i 

ddddddddddddd 



Ns6,Ks5 



Fig, 1. For the six different landscapes explored, plots show the number of generations 
for finding the peak (Y axis) as a function of the mutation rate (X axis — note the 
scale is not linear at the lower end of the axis), for both asexual and sexual populations. 
Error thresholds are indicated by solid vertical arrows (asexual) and dotted vertical 
arrows (sexual). 
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N = 2. K=10 




Pig. 2. To demonstrate the significant variance in the results, for one of the landscapes 
we show error bars for +/- one standard deviation in the number of generations for 
finding the peak. 



optimal mutation rates to lower values. This evidence points indirectly towards 
the relationship between the notions of error thresholds and optimal mutation 
rates. 

In this paper, wo explored a single family of fitness functions, namely Royal 
Staircase functions. We have found, however, compatible results using the Kauff- 
man NK family of landscapes. Moreover, the methods discussed here can be ap- 
plied elsewhere. Error thresholds, and hence the implications of our results, hold 
for landscapes with certain degree of discontinuity. Thus, they are not applica- 
ble to smooth monotonic landscapes (or regions). Results will also be modified 
where there is elitism, either explicit or that implicit in some tournament selec- 
tion algorithms for steady-state GAs. 

Finally, there is a circularity implicit in using the method suggested here for 
assessing optimal mutation rates in real problems as opposed to toy landscapes 
— namely that one cannot estimate the error threshold until after one has found 
the global peak. However, knowledge of error thresholds in one region of the 
landscape, or in one member of a class of landscapes, may be of guidance in 
assessing error thresholds (and thus optimal mutation rates) elsewhere. For this 
to be the case one needs to make .some assumptions of statistical regularity, but 
such assumptions are necessary anyway for EAs to be practical at all. So despite 
this circularity there is the prospect that in real EA applications where the fitness 
landscape is unknown except to the extent that it is sampled experimentally, the 
methodology given above allows for experimental assessment of error thresholds 
and thus guidance for setting optimal mutation rates. 



Acknowledgements Thanks to A. Meier for support and critical reading. 
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Are Artificial Mutation Biases Unnatural? 
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Abstract. Whilst the rate at which mutations occur in artificial evolu- 
tionary systems has received considerable attention, there has been little 
analysis of the mutation operators themselves. Here attention is drawn 
to the possibility that inherent biases within such operators might arte- 
factually affect the direction of evolutionary change. Biases associated 
with several mutation operators are detailed and attempts to alleviate 
them are discussed. Natural evolution is then shown to be subject to 
analogous mutation “biases” . These tendencies are explicable in terms of 
(i) selection pressure for low mutation rates, and (ii) selection pressure 
to avoid parenting non-viable offspring. It is concluded that attempts 
to eradicate mutation biases from artificial evolutionary systems may 
lead to evolutionary dynamics that are more unnatural, rather than less. 
Only through increased awareness of the character of mutation biases, 
and analyses of our models’ sensitivity to them, can we guard against 
artefactual results. 



This paper explores the potential for artefactual evolutionary simulation re- 
sults to derive from biases inherent within the artificial mutation operators they 
employ. As an example, consider a recent coevolutionary simulation model which 
has suggested that signals exhibiting complex symmetry could evolve merely as 
a side effect of selection for distinctiveness [1]. 

The model involved multicoloured, composite patterns coevolving with sim- 
ple artificial neural networks under a inutiialist selection regime. Networks which 
were able to discriminate signal patterns from distractor patterns were favoured, 
as were discriminable signal patterns. This discrimination had to be achieved de- 
spite patterns being presented in various orientations and positions on each net- 
work’s artificial “retina” . After a period of simulated coevolutioii, the authors 
report that networks were able to distinguish signals from distractors almost 
perfectly, and that the coevolved signal patterns displayed “marked symme- 
tries” (p. 171). The authors note that, like many natural displays, the evolved 
signals consisted of “purer, brighter colours” (p. 171) than average signals. An 
evolutionary-functional account for the evolved symmetry was proposed - sym- 
metrical signals persist because they are invariant under the various transfor- 
mations involved in their presentation, and hence easier to discriminate from 
distractors. The evolved signals’ bold coloration was explained as the result of 
selection pressure to diverge from random distractor patterns. 
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However, a replication of the study demonstrated that the complex symme- 
try of the evolved signals resulted from an unnaturally structured presentation 
regime [2], Might the boldness of the evolved signals result from a similar sim- 
ulation bias? One possible source of such a bias is the simulation’s mutation 
operator. 

During reproduction, each of the parent signal’s colour components was sub- 
ject to a small chance of mutation. A mutation event, when it occurred, per- 
turbed the parental value by a small increment. Since each colour component 
was coded for by a real value lying in the interval [0, 1], some of these perturba- 
tions resulted in mutant values which lay outside the legal range. The authors 
do not report the measures taken to deal wuth such illegal mutations. However, 
a mutation operator which replaced illegal mutant values with the nearest legal 
value might favour extreme colour component values merely through evolution- 
ary drift. Perhaps this, rather than some pressure to deviate from the average, 
explains the simulation’s results? 

How might mutation biases occur in general? What are the possible effects 
of mutation bias? How can we test for the presence of mutation biases, and 
eliminate them from evolutionary simulations? These questions are of practical 
significance for any evolutionary simulation modeller. 



1 Artificial Mutation 

Within individual-based evolutionary computer simulations, artificial mutation 
operators are relied upon to provide the genetic diversity upon which selec- 
tion may act. Simulations involving sexual populations augment mutation with 
crossover operators which allow offspring to inherit passages of genetic mate- 
rial from each parent. However, mutation continues to be the major source of 
genetic novelty in such simulations, since crossover operators reshuffle existing 
genes, rather than introduce new ones. 

Many studies have explored the effect on artificial evolution of employing 
various genetic operators (e.g., one-point or two-point crossover, the transposi- 
tion, repetition and excision of genetic material), varying rates at which these 
operators are employed, and even allowing the types of operators and the rates 
at which they are employed to themselves be subject to adaptation. For example, 
the proceedings of one early conference on genetic algorithms [3] contains five 
papers which between them address all of these concerns. More recent studies 
have developed these ideas [4-6]. 

How'ever, since much of this work takes place within engineering contexts 
where there need be little concern for evolutionary plausability, little attention 
has been paid to the biases inherent within even the simplest mutation opera- 
tors. Although the construction of these operators is not typically regarded as a 
complex matter, as will be argued below, they differ significantly from natural 
mutational processes, and their design therefore involves issues which are unique 
to the production of evolutionary simulations. 
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2 Legal Bounds in Natural and Artificial Genetic Systems 

Consider a phenotypic trait which may vary over some range. We may distinguish 
between two classes of such a trait. Unbounded traits may vary limitlessly. Such 
traits might include the significance afforded to a male display by a female 
onlooker, which might vary from zero (non-significant), through positive infinity 
(infinitely attractive), or negative infinity (infinitely repellent). Bounded traits 
are those for which the range of legal values is in some way limited. Many traits 
suffer either a lower or an upper limit, and are thus partially bounded. For 
example, one cannot have fewer than zero legs. Some traits are bounded at both 
extremes, e.g., a trait governing the time of day at which one begins to forage 
might vary between dawn and dusk. Such a notion of boundedness raises the 
associated notion of legal and illegal genotypes, the latter being those that code 
for traits which transgress their legal limits. 

The distinction being made here is an unnatural one which cannot easily be 
applied to natural genetic encodings. For example, although natural genes may 
code for what appears to be a bounded phenotypic trait (e.g., the redness of a 
signal) , the manner in which they do so may logically preclude the occurrence of 
illegal values for this trait. Additive polygenic traits, for example, might code for 
the varying degree of a phenotypic trait with a varying number of genes. Since 
such a genotype cannot contain a negative number of these genes, it cannot code 
for an illegal phenotypic trait (e.g., a signal with negative redness). 

In one sense however, an “illegal genotype” is one which does not result in a 
viable organism. Clearly, such mutants will not leave offspring. Their genotypes 
will be selected out of the population. This selective process is the same one that 
excludes viable mutants which are less well adapted to their niche than their 
competitors. From the perspective of natural selection, there is no difference 
(ignoring indirect fitness effects) between failing to be born living, failing to 
survive to reproductive age, failing to mate, or failing to reproduce before dying 
of old age - all such failures are awarded zero fitness. 

Genotype legality within artificial evolutionary algorithms falls somewhere 
between the accounts given in the previous two paragraphs. It is true that, as 
in the first account, genotypes with illegal trait values are never realised as or- 
ganisms, and are thus never subjected to the same selective pressures as their 
valid conspecifics. However, illegal genotypes are generated by many evolution- 
ary algorithms and are thus not excluded out of logical necessity as in the first 
account, but are selected as invalid, as in the second account. But the grounds 
upon which illegal genotypes are selected are not the same as those which govern 
phenotypic selection. The legality of genotypes is assessed, prior to any morpho- 
genetic, developmental or ontogenetic performance, on the basis of the genotype 
itself, rather than the performance of the associated (unrealisable) phenotype. 
If found to be legal, the genotype is translated into a phenotype and assessed as 
normal. If found to be illegal, some alternative course of action must be taken. 
It is this lack of correspondence between artificial evolutionary algorithms and 
natural evolution which raises the possibility that the character of artificial mu- 
tation operators may be unnatural due to the biases which they introduce. 
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3 Mutation Operators 

In this section, the biases of various mutation operators will be characterised. 
These operators differ in the manner in which they treat illegal mutant values. 
In order to describe the repercussions of these differences, I will use the terms 
departure rate to denote the relative rate at which a particular parental value is 
altered by a mutation operator, and arrival rate to denote the relative rate at 
which a particular mutant value is generated by a mutation operator. 

First, a distinction must be made between context-free and context-sensitive 
operators. Whilst context-free operators generate mutant values which are in- 
dependent of parental values, context-sensitive operators are predicated on such 
variables. This latter class of operator (which will be the focus of this paper) 
typically generate mutant values which are close to the relevant parental values 
in an effort to mimic natural evolution. Many of the issues discussed here will be 
most pertinent to either continuous- valued genes or discrete (typically binary) 
encodings. However, most are to some degree applicable to both. 

Genes coding for bounded phenotypic traits may be subjected to context-free 
mutations through drawing a random value from the range of legal values avail- 
able for a trait. This operator will be referred to as the “Flat” mutation operator 
throughout this paper. For example, when mutating a trait which governs the 
hirsuteness of an organism, a Flat mutation operator might ignore the degree 
of body hair possessed by the parent and simply assign the mutant offspring a 
random number between 0% and 100%. Such an operator is flat in that both the 
departure rate and arrival rate are uniform across the valid range of the trait. 

Similarly, genes coding for unbounded, or partially bounded, phenotypic 
traits may be mutated independently of their parental value through imposing 
some arbitrary mutant range upon the phenotypic trait and picking a value from 
this mutant range. Such a solution is unsatisfactory, however, as this method 
introduces a mutation bias. Despite there being an equal probability of any 
parental value being mutated (a flat departure rate), there is not an equal prob- 
ability of any legal mutant value being generated by the mutation operator, the 
arrival rate outside the mutant range being zero, whilst that inside this range 
is positive and fiat. Such a bias may iead to artefactual results. For example, if 
the optimal trait lies outside the mutant range, a population of organisms may 
evoh^e to lie clustered at tlie mutant ralue nearest to this optimum. Although 
the population may appear to have converged on some stable phenotype, this 
appearance is an illusion created by the mutation bias. 

In contrast, context-sensitive mutation operators which perturb the parental 
genotype by some small amount have the potential to generate illegal values for 
bounded traits. For such mutation operators, decisions concerning the treatment 
of these illegal mutations must be made. There are a number of straightforward 
options that an operator designer might take; 

Absorb Illegal mutant values are truncated to the nearest boundary. 

Repeat Mutant values are repeatedly generated, until a legal value is obtained. 




68 



Abtorta 


OJS 


RepMl 


•as 


Roplac* 


025 


Flat 




u 1 ^ 


ill 


mill 


MS 


llllllllll 


a 


hmum 




** ” •* 


as .IS .25 .» 






05 .15 25 25 45 25 25 .75 25 .55 


05 15 .25 .36 4 5 25 25 .75 15 .«5 
Rang* 


ignore 

03S y 

4iiiii 


a.3Si 

nil H 


Refloet 


•as 

oao 


Wr^ 

llllllllll 


1 

S DOC 
Z 400 

J”1 


Deviation from Uniformity 






as IS 25 as 


.45 S5 as .75 as as 
Ring* 




as 15 25 .35 .45 25 25 75 25 .»5 




n 1 1 1 * ‘ 



Fig. 1. Average distribution of trait values across 20 asexually reproducing popula- 
tions of 1000 single-trait organisms after 5000 generations of evolution on a flat fitness 
surface, for each of seven mutation operators (see text). Traits were real values in 
the range [0, 1]. Mutation events occurred with probability 0.01, and consisted of real- 
valued perturbations drawn from a normal distribution with zero mean and standard 
deviation 0.2. Absorb, Repeat and Replace show clear deviation from uniformity in 
their aggregate performance. A Flat, context-free operator is shown for comparison. 
Three of the lower panels suggest that over many runs the behaviour of Ignore, Reflect 
and Wrap is roughly equivalent to Flat. However, the fourth lower panel shows that 
their mean deviations from uniformity (see text) differ significantly from each other, 
and from that of traits evolved under a Flat mutation operator. All graphs show mean 
frequencies with attendant standard errors. 



Replace Any offspring for which illegal trait values are generated is replaced 

by a new offspring, re-choosing parents. 

Each of these three operators result in mutation biases which either favour or 
resist extreme- valued traits (Fig 1). The “Absorb” operator will effect a constant 
departure rate across the range of trait values, save that it falls to half the 
nominal mutation rate at the extremes of the trait’s legal range. In addition, 
the arrival rates at these extremes are increased by their absorbent nature. This 
ensures that trait values near the legal boundaries of the trait will tend to reach 
them and be kept there. Conversely, the “Repeat” operator will maintain a 
constant departure rate across the legal range of trait values, but will tend 
to mutate extreme parental values away from the boundaries of the trait. This 
ensures that arrival rates decrease as trait values approach their legal limits. The 
“Replace” operator, although seemingly the most accurate reflection of natural 
mutation processes, in that illegal or non-viable offspring are simply rejected, 
results in a selection pressure that resists the evolution of extreme trait values due 
to the increased likelihood that extreme genotypes will generate illegal offspring. 
Several mutation operators might be constructed speciflcally to alleviate these 
edge-effect biases. 
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Ignore Mutation events which transgress legal bounds are ignored. Rather than 
inherit an illegal mutant value, offspring inherit the parental value. 

Reflect Mutant values lying a distance of x above (or below) the legal range 
are replaced by vaues a distance of x below (or above) the nearest boundary. 
Wrap The trait is treated as if it were periodic. The edges of its legal range 
“wrap” around. Mutant values are calculated modulo the trait’s range. 

The aggregate behaviour of these operators is uniform (Fig 1). However, they 
differ in how they achieve this aggregate uniformity. For example, under the 
“Ignore” regime, more illegal values will be generated (and ignored) for parental 
values near the extremes of the legal range. This ensures that the departure rate 
will decrease as parental values approach legal extremes. However, this reduction 
in departure rate is prevented from systematically biasing the population by a 
correlated reduction in arrival rate. To the extent that a trait value x is extreme 
and hence suffers a low departure rate, it will be in an extreme neighbourhood 
also suffering a low departure rate. Since it is through mutations affecting parents 
within this neighbourhood that x is likely to arise, this low departure rate will 
lower a;’s arrival rate. As a result, extreme values are less likely to be generated 
by the operator, but when generated, are less likely to be mutated. Whilst this 
balance ensures that populations are not systematically biased toward or away 
from some areas of the trait space, it also results in individual distributions with 
a certain character. In order to determine whether, and to what extent, this 
character differs from that of populations under different mutation regimes, a 
measure of deviation from uniformity was calculated. 

A value estimating the degree of deviation from a uniform expected distri- 
bution was calculated for each individual evolved population contributing to the 
aggregate distributions graphed in Fig 1. The mean value for each operator 
is shown in the lower far-right panel. This measure of deviation from uniformity 
differentiates between the three pseudo-flat operators, demonstrating that de- 
spite their similar aggregate performance, they each exert a unique influence on 
the character of individual populations. In addition, this metric demonstrates 
that aggregate flat performance can disguise individual distributions which tend 
to be very far from flat. The Ignore operator, for example, generates distribu- 
tions which are on average less flat than either Repeat or Replace, despite there 
being no systematic bias to this non-uniformity. 

4 Discrete Encodings 

Two kinds of discrete encoding must be distinguished. Both allow a trait to take 
one of a (possibly infinite) number of discrete values. The first treats these values 
as unitary wholes for the purposes of mutation (and crossover), whereas the sec- 
ond represents these values with a number of discrete digits, each independently 
exposed to the possibility of mutation. Whilst the issues discussed above apply 
fairly straightforwardly to the first kind of discrete encoding, the second kind 
raises new issues. 
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Fig. 2. Two three-bit Gray codes expressed over (a) the canonical Hamiltonian circuit, 
and (b) a mere Hamiltonian path. An n-bit canonical Gray code may be constructed 
by reversing the (n-l)-bit canonical Gray code which it contains (dashed box), and 
appending a 0 to the first bit strings and a 1 to the remaining bit strings. 




Fig. 3. Left-, frequency distribution of mutation sizes for an 8-bit Gray-coded trait. 
Right: distributions of mean mutation destination by parental trait value for Gray, 
Binary and Flat mutation operators. Discontinuities could impede a population’s evo- 
lution. 



Encoding traits as a vector of discrete digits, and moving between trait values 
through manipulations at the level of these individual digits, imposes a structure 
on the mutation space of a model. There are a" possible values represented by 
an n-dimensional code utilising an alphabet of a symbols, but each individual 
vector has only n(a — 1) immediate neighbours. This ensures that the code’s 
neighbourhood relationships might systematically bias evolutionary change in 
particular ways. 

Traits encoded as groups of binary digits which are interpreted as phenotypic 
values are typically mutated via random bit flips. When such a mutation operator 
acts upon conventional binary numbers, mutated phenotypic traits are poorly 
correlated with parental phenotypic traits, since a single bit-flip may result in 
a large change in the value for which the bit-string codes. Gray coding [7], 
through ensuring that consecutive integers are coded for by adjacent binary 
strings, increases the correlation between mutants and their parents. In addition 
to sharing the general appeal of context-sensitive mutation operators, this coding 
scheme is advocated by genetic algorithm designers, who argue that it makes 
evolutionary search more effective [8], 

Constructing an n-bit Gray code can be thought of as assigning each of 2" 
values to each of the vertices of an n-dimensional binary hypercube such that 
adjacent values are assigned to adjacent vertices lying on one of the hypercube’s 
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Hamiltonian paths (paths which visit each vertex once and visit all vertices). 
Typical Gray code algorithms [8,9] use a Hamiltonian circuit, in which the ter- 
minal vertices are also adjacent (Fig 2). It is unclear to what extent the claims 
of improved performance under Gray code schemes can be generalised from this 
canonical Gray code to other instances [8]. 

Though Gray codes ensure that there always exists a mutation event capable 
of perturbing a trait’s value by a unit, the character of the remaining distribution 
of single-bit mutation events is unspecified. In fact, the canonical Gray code 
achieves a distribution of mutation events which decays roughly exponentially 
with mutation size (calculated as the absolute difference between pre-mutation 
and post-mutation trait values). This distribution exhibits strong discontinuities 
(Fig 3); there are no mutation events of even- valued magnitude. This implies 
that any mutation event changes both the parity and magnitude of the inherited 
value. For instance, if the former determined an organism’s handedness whilst the 
latter coded for degree of handedness, this mutation bias would ensure that any 
mutant would exhibit both a (typically small) change in the degree of handedness 
and a change in which hand was preferred. Care must thus be taken to ensure that 
the constraints on mutation imposed by a discrete encoding do not systematically 
interfere with evolutionary change. 



5 Implications 

The threat to simulation modelling posed by the biases described above may 
seem overworked. However, the analyses clearly reveal the potential for the de- 
sign of mutation operators to systematically influence trajectories of evolutionary 
change. One kind of solution to this problem has already been described, namely, 
to attempt the construction of bias-free operators. However, we have seen that 
while such attempts may alter the biases exhibited by an operator, these biases 
are typically not extinguished. It is also the case that whereas the most natu- 
ralistic mutation operators (e.g., Replace, or Absorb) might generate the most 
striking effects, the biases exhibited by these operators are explicable in terms 
of selection pressures which are present in naturally evolving populations. 

For example, natural selection, ceteris paribus, favours genotypes which (i) 
have relatively low mutation rates, and (ii) are reasonably far from non-viable 
mutants [10,11] (but see [12,13]). Between them, these two selective forces can 
account for the biases exhibited by the most straightforward mutation opera- 
tors considered here (Absorb, Repeat, and Replace). However, the biases of the 
operators designed specifically to negate these selection pressures (e.g., Ignore, 
Reflect and Wrap) are more difficult to account for in terms of natural selection 
pressures, as a result of their unnatural treatment of trait boundaries. 

How are simulation modellers to decide between competing mutation oper- 
ators - should they choose mutation operators that are clearly biased, but in a 
manner underwritten by natural selection, or choose alternative operators which 
although in some sense are “less biased”, exhibit idiosyncrasies that have no 
biological analogue? I propose that this problem can be resolved by addressing 




72 



a more pressing question: how are simulation modellers to guard against the 
artefactual results which may be caused by such biases? 

Ultimately, in order to answer this question, the kinds of biases exhibited by 
various classes of mutation operator must be better understood, and attention 
must be paid to the conditions in which these biases will affect simulation results 
most severely. For instance, it appears that the influence of mutation bias will 
be strongest when populations are under only weak selection pressure. This situ- 
ation may occur when populations drift across fitness plateaux, or during initial 
evolutionary transients which may be especially sensitive to population make- 
up. However, mutation biases may still influence the evolution of populations 
under strong selection pressures if these pressures conflict, through favouring 
one rather than the other. These are issues demanding further analysis. 

In the absence of a principled understanding of mutation bias, however, two 
steps can be taken to minimise their influence. The presence of potential artefacts 
can be detected through exploring (i) the sensitivity of the simulation’s behaviour 
to variation of its initial conditions (a procedure which has a number of other 
advantages and should thus be undertaken in any case), and (ii) the sensitivity 
of the simulation’s behaviour to variation of the mutation operator employed (a 
less generally useful technique, but one which may be necessary if mutation bias 
is suspected). 

Here the application of these two techniques to the example with which this 
paper opened will be described. In order to explore whether the bold coloration 
of the coevolved signal patterns was an artefact brought about by mutation 
bias, a previous study [2] explored the sensitivity of the result to manipulation 
of the initial ancestral population: populations of signals were able to evolve 
away from maximally bold ancestors, but they tended to re-converge on bold 
colours, although not necessarily those with which the populations were seeded. 

Here the second line of sensitivity analysis will be reported. Simulations iden- 
tical to those carried out in [2] were implemented, save that the mutation oper- 
ator was varied across three conditions - Flat, Repeat, and Absorb. Boldness, 
b, was calculated as 1 — \/4/3X)?=i > where r*, gi, and bi are the 

distances of the red, green, and blue colour components from their nearest 
boundary (i.e., 0 or 1). The mean boldness (across 100 simulation runs) of co- 
evolved signals was sensitive to changes in mutation operator in the direction 
predicted by the analyses presented in this paper. Whilst the Absorb operator 
(b — 0.87, = 0.006) resulted in coevolved signals which were significantly 

bolder than those coevolved under either the Flat or Repeat regimes (b = 0.81, 
cTj = 0.004, and b = 0.82, ai = 0.004, respectively), all three mutation operators 
generated signals which were bolder than random signals (6 = 0.5). 

These results demonstrate that whilst the simulation is sensitive to the dif- 
ferent characteristics of different mutation operators, mutation bias is not the 
underlying cause of the boldness of the evolved signals. While Enquist and Arak’s 
original hypotheses may in fact be correct, an alternative explanation is that bold 
signals exploit an inherent preference for extreme-valued inputs on the part of 
the simple artificial neural networks with which they coevolve [14,2]. 




73 



6 Conclusion 

The potential for a mutation operator’s inherent biases to influence the dynam- 
ics of evolutionary simulation models was demonstrated for a range of prevalent 
mutation operators. Although these mutation biases may be the source of arte- 
factual results, they are also part and parcel of natural selection. Rather than at- 
tempt to eliminate them from evolutionary simulations, their presence should be 
tolerated and controlled for. Greater understanding of the role of mutation bias 
in shaping the evolutionary dynamics of both natural and artificial evolutionary 
systems can only increase our understanding of the parallels and disirnilarities 
between artificial and natural evolutionary processes. 
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Abstract. A version of the standard genetic algorithm, in which the 
mutation rate is allowed to evolve freely, is applied across a set of opti- 
misation problems. The resulting dynamics confirm the hypothesis that 
mutation rate, when allowed to evolve, will do so partly as a function of 
altitude in the fitness landscape. Further, it is demonstrated that this fact 
can be exploited in order to improve efficiency of the genetic algorithm 
when applied to a particular class of optimisation problem. Specifically, 
significant efficiency gains are established in those problems in which the 
fitness function is not stationary over time. 



1 Introduction 

In most cases, mutation is far more likely to be detrimental to the future survival 
of a genome than it is to be beneficial [1,2]. Given a basic principle of selection, 
if mutation rate is allowed to evolve, this will generally result in an evolutionary 
pressure favouring lower rates of mutation [3]. However, this pressure will vary 
[4]. The more highly optimised a particular genome is, the stronger the pressure 
will be to push mutation rate down. For a poorly adapted genome, there will 
be many more ways in which it can change so as to increase fitness. This will 
result in a much lower downward pressure on mutation rate, or for very poorly 
adapted genomes, selection towards higher rates. 

These principles define the basic hypothesis that is to be tested and explored: 
if allowed to evolve, mutation rate will do so as an inverse function of altitude 
in the fitness landscape. This has potential applications with regard to the op- 
timisation of certain genetic algorithms. In particular it suggests an improved 
performance over non-stationary optimisation problems. As the fitness landscape 
changes over time, the mutation rate will be able to adapt in order to maintain 
an improved rate of optimisation in such problems. 

A substantial amount of literature exists already on parameter tuning and 
control in GAs, including much significant work on mutation. The most com- 
prehesive surveys of this work can be found in [5] and [6]. As can be seen from 
these papers, work on self-adaptive mutation has received most attention with 
regard to Evolution Strategies. Comparatively little work has been focused on 
self-adaptive mutation rates in GAs at an individual level, as is considered here. 
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2 Method 

To explore the hypothesis, a genetic algorithm is used in which the mutation 
rate for each chromosome is allowed to evolve independently. For comparative 
purposes, the same GA with a fixed mutation rate is also considered. 

The two algorithms are applied over a suite of ten optimisation problems. 
These functions are described in detail below. Each of the problems was run a 
total of 100 times for each algorithm, and the results averaged. 

The algorithms used were based around a standard bit-string GA, as de- 
scribed in Holland [7]. Additionally, each chromosome contained a floating point 
number between 0 and 1, to be used as the mutation rate for that chromosome. 
Single-point crossover was used, together with tournament selection. 

A two-part mutation operator was used. Firstly, the bit-string portion of the 
chromosome was mutated with a probability equal to the mutation rate portion 
of the chromosome. In the fixed mutation rate algorithm, this was set at 0.01. 
The second stage of mutation was only carried out in the self-adaptive GA. Each 
mutation rate was itself mutated with a probability equal to itself. A mutated 
mutation rate was assigned a random floating-point value from 0 to 1. 

All parameters were based on the results of the studies by De Jong [8], Grefen- 
stette [9], and Schaffer, Caruana, Eshelman, and Das [10]. A further guiding 
factor was that the parameters should be chosen so as to maximise the effect of 
mutation rate in each case. The small population size of 20 is the lower limit 
suggested by these studies. As De Jong suggests, a small population size will 
increase the role that mutation has to play. Likewise, the crossover rate of 0.75 
was at the lower bound of the ranges suggested. 

A variety of starting conditions were tried for the mutation rate of the self- 
adaptive GA. These included having various fixed rates across the entire popula- 
tion, and randomly setting mutation rate for each individual. However, regardless 
of initial settings, the behaviour of the algorithm settled into the same pattern 
after approximately 40-50 generations. In any sort of extended run, therefore, 
initial mutation rates are of little consequence. An initial mutation rate of 0.01 
across the entire population (identical to that used in the standard GA) was 
used for those specific examples discussed below. 

3 The Test Suite 

The three stationary problems are primarily derived from Ackley [1 1] . They cover 
three common optimisation situations: a linear fitness function, a fitness function 
containing many broad plateaus, and a function containing a large number of 
local maxima. 

The linear function can be represented by the formula f(x) — c, where f(x) is 
the fitness of the chromosome x, and c is the number of 1 bits in the bit-string 
part of the chromosome. 

The plateau function is defined as follows: divide the bit-string portion of the 
chromosome into four equal length groups. If all the bits in a group are equal 
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to 1 then add 25 to the fitness value, otherwise add 0. The fitness landscape for 
this function will have 4 plateaus leading to a peak of fitness 100. 

The third function is what Ackley describes as a ‘porcupine’ function with 
many local maxima, and is defined as follows: f(x) — 10c - 15(1 - parity(c)), 
where parity (c) effectively returns a value of 0 if c is odd, and 1 if c is even. 

The next six problems in the test suite are non-stationary variants of these 
three problems. Each of the three has two variants, one with a cycle length of 25 
generations, and one with a cycle length of 50 generations. In each case, every 
time the specified number of generations is reached, the function ‘flips’ such that 
instead of c being equal to the number of Is in a bit-string, it becomes equal to 
the number of Os (or vice versa). 

The final problem is the non-stationary blind knapsack problem, taken from 
Goldberg [12]. The basic knapsack problem is as follows: given a number of 
objects, each with a specified weight and value, maximise the value of objects 
carried in a knapsack, given that it can only hold a limited weight. In the non- 
stationary, blind version of this problem, the weights and values of the objects 
periodically change, and there is no prior knowledge of any particular weights 
or values - only the total weight and value of a given combination is available. 

4 The Results 




Fig. 1. Graph of 25 generation ‘porcupine’ fitness function 



The first set of runs involved testing the GA with self-adaptive mutation rate 
on each of the problems from the test suite. On the stationary problems, the 
mutation rate remains fairly constant throughout. On the non-stationary prob- 
lems the mutation rate initially behaves in an identical manner. However, as the 
fitness function changes, large changes in the mutation rate are evident. Drops 
in fitness can be seen to be highly correlated with increases in mutation rate, 
with higher fitness levels being correlated with lower mutation rates. 
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Fig. 2. Graph of 50 generation ‘plateau’ fitness function 



After generating results for the self-adaptive GA, the test suite was then 
run using the standard GA, and the results plotted against the first set. On 
the stationary problems the behaviour of the two algorithms is almost identical. 
However, on the non-stationary problems the self-adaptive GA does consistently 
outperform the standard G A, both in terms of rate of increase of fitness and max- 
imum fitness achieved. Even on those problems in which the standard GA does 
at times outperform the self-adaptive GA, it does so with much less consistency 
(see for example the non-stationary plateau function above). 

5 Discussion 

The results clearly indicate that levels of fitness and mutation rate are related. 
Low fitness levels result in a definite selective advantage for higher mutation 
rates, with increasing fitness resulting in a selective advantage for reduced mu- 
tation. 

In all of the problems tested, the performance of the self-adaptive GA ei- 
ther equals or surpasses that of the standard GA. In particular, as expected, 
significant improvement can be seen in the non-stationary problems. It can be 
noted that in most cases the self-adaptive mutation rate stays at levels well 
above the 0.01 mutation rate set for the standard GA. It might be thought that 
the increased performance of the self-adaptive algorithm is simply due to this 
increased mutation rate, rather than self-adaptation. To test this, various values 
were tried for the fixed mutation rate. In most cases 0.01, or lower, was found to 
be optimum. For example, with mutation rate fixed at 0.01 in the 50 generation 
linear problem the average maximum fitness obtained over 100 runs was 19.88 
for the first 50 generation cycle, 19.23 for the second. With the mutation rate 
increased to 0.1, comparable to the higher values of the self-adaptive mutation 
rate, the average maximum fitness drops to only 18.12 at the end of the first 
cycle, and 18.38 at the end of the second. 
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Traditionally, mutation has been considered very much a secondary operator 
within genetic algorithms (dating back to Holland [7]), with crossover playing 
the dominant role. However, this may in part be due to the particular range of 
problems from which optimum GA parameters have been derived. De Jong’s [8] 
test suite, for example, consists entirely of stationary optimisation problems. It 
is evident that in some cases mutation can play a significant role in improving 
the performance of GAs. 

The mutation dynamics and sorts of fitness landscape considered here are 
common also in biological systems. This sort of self-adaptive GA could therefore 
also be of use in modelling these mechanisms in natural evolution. Future work 
will explore this possibility. 
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Abstract. Isaacs’ treatise on differential games was a break-through for 
the analysis of the pursuit- an d-evasion (PE) domain within the context 
of strategies representable by differential equations. Current experimen- 
tal work in Artificial Life steps outside of the formalism of differential 
games, but the formalism it steps into is yet to be identified. We in- 
troduce a formulation of PE that allows a formalism to be developed. 
Our game minimizes kinematic factors and instead emphasizes the in- 
formational aspect of the domain. We use information-theoretic tools 
to describe agent behavior and implement a pursuit strategy based on 
statistical decision making; evaders evolved against this pursuit strategy 
exhibit a wide range of sophisticated behavior that can be quantitatively 
described. Agent performance is related to these quantifiables. 



1 Introduction: Body and Mind in Pursuit and Evasion 

Researchers in the field of Artificial Life (ALife) frequently turn to the domain 
of pursuit-and-evasion (PE) to study the (co)evolution of complex agent behav- 
ior. PE is of particular interest because it provides a parsimonious framework of 
agent interaction as well as ethological interest. Isaacs’ ground-breaking theory 
of differential games [15] shows that, for a great many forms of the PE game, op- 
timal (minimax) player strategies can be derived analytically from knowledge of 
the players’ abilities. The applicability of differential game theory is constrained 
by two requirements: 1) a finite set of state variables must completely capture 
the instantaneous description of the game (e.g., agent positions, velocities, ac- 
celerations, etc.) necessary to compute the future unfolding of the game — the 
game is one of perfect information; 2) agent strategies must be expressible as 
differential equations that operate on these state variables. These constraints 
make the theory (and its more modern versions, e.g., [8]) a powerful model of 
agent kinematics (e.g., speed, maneuverability, etc.), as kinematics are partic- 
ularly amenable to differential analysis. Indeed, differential game theory was 
constructed to answer questions about agents that are distinguished primarily 
in their kinematic abilities. 

Much ALife research, however, steps outside of the formal assumptions made 
by Isaacs’ theory; evolvable sensory apparatus [5] and evolvable information- 
processing substrates for agent control, such as artificial recurrent neural net- 
works, move the PE domain beyond the purely kinematic realm to include an 
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informational one as well. Clearly, differential game theory remains an influen- 
tial formalism in the design and analysis of experiment kinematics. But, what 
formalism has been adopted to address the informational aspects of current ex- 
perimental work? The lack of a rigorous metric of agent behavior has been rec- 
ognized [7 ] — how do we ascertain and describe the sophistication of an agent? 
Recently, information-theoretic tools have been used to measure and adjust en- 
vironmental complexity to facilitate agent learning in various domains, including 
PE [11, 13]; information theory has also come into use for the measurement of 
agent behavior, for example in a discrete state space [23] and a “linear” form 
of the PE game [10]. Here, we will introduce the use of information theory to 
measure agent behavior and environmental complexity in a two-dimensional, 
continuous-space pursuer-evader domain. 

The purpose of replacing (stateless) minimax strategies with evolvable agent 
controllers is to place the pursuit and evasion roles within a coevolutionary 
setting such that an arms race between ever more sophisticated evasion and 
pursuit strategies arises [16, 19, 24, 6, 7, 12, 26]. Unfortunately, the envisaged 
coevolutionary arms race towards complexity is yet to be substantially realized, 
or, at least, observed; a number of problematic issues surrounding coevolutionary 
techniques are known to exist, such as the Red Queen Effect [6], mediocre stable- 
states [22], and the instability of two-population coevolution [4, 2, 9]. 

If we want an arms race to complexity, we must consider the three compo- 
nents that are responsible for the behavior of the typical PE agent: its sensory 
apparatus, kinematics, and processing abilities. Which of these three components 
possess the potential when evolved to engender an arms race to complexity? How 
might asymmetries between the pursuer and evader in terms of some of these 
components (say, their sensory and kinematic abilities) diminish the usefulness 
of evolving others (say, their computational abilities)? We assert that the role 
of processing ability is yet to be appreciated in its own right — that it has been 
overshadowed by kinematic and sensory factors. 

In this paper, we present a simple formulation of the two-dimensional pursuer- 
evader game that, by giving the players kinematic parity and a form of sensory 
asymmetry, requires them to model opponent behavior in order to achieve opti- 
mal performance (in the game-theoretic sense). The behavior of an agent deter- 
mines the complexity of the statistical model required to adequately represent it. 
The success of an agent reflects the power of its representational and modeling 
abilities. In this sense, there exists the potential for an arms race in complexity. 

We begin, however, by using simple evolution to evolve evaders against hand- 
built predictors that employ statistical tools to model evader behavior. The ques- 
tions of interest are: what happens when the statistical model of the pursuer is 
poor? What kind of behavior must an evader exhibit to defeat statistical predic- 
tion of some finite power? Would such evasion behavior resemble our intuitive 
understanding of protean behavior [19]? Our results show that evaders are able 
to evolve behaviors of substantial sophistication when placed in opposition to a 
powerful pursuit strategy. 

We start wdth a formal definition of the game and discuss some game-theoretic 
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features. We review the information-theoretic tools used to measure behavior. 
Experimental results are then discussed. Finally, we point to future work. 



2 A Game of Incomplete Information 

2.1 Definition 

Our game is played in the real-valued, unbounded (two-dimensional) plane. Both 
agents move with simple motion: each agent moves at a constant velocity and is 
able to instantaneously change its direction by any amount. In our game, both 
agents move at the same velocity of one body-length per time-step. While the 
agents are able to freely pick a direction, they can only do so at the beginning of 
each time-step; they must then commit to moving in their chosen directions for 
the duration of one time-step. The pursuer can see the evader’s current location, 
but the evader is completely blind. Thus, the pursuer has the opportunity to 
respond to the behavior of the evader, while the evader’s behavior is ballistic. 

Each game begins with the pursuer and evader occupying the same location, 
the plane’s origin. Because we are interested in statistical characterizations of 
agent behavior, each game must be long enough for a reasonably representative 
sample of behavior to be made with respect to the type of statistic we wish 
to collect. (In practice, the behaviors of our evolving evaders quickly achieve 
steady state— initial transients are either very short or unfold so slowly as to be 
indistinguishable from steady-state behavior for our purposes.) 

Thus, the terminating condition occurs when the final time-step transpires, 
rather than when the evader gets tagged or reaches some “escape” distance. The 
evader’s job is to maximize the average distance between itself and the pursuer 
over the course of the game. The pursuer’s job is to maximize the number of 
tags, which is subtly more general than minimizing distance, as we will discuss. 
A tag occurs when the pursuer comes within one body-length of the evader. 

2.2 Game Theoretic Features 

This game is designed to require players to induce models of opponent behav- 
ior (the type of model is discussed below). Here we explore the game-theoretic 
features that create such conditions. To begin, we wish to eliminate kinematic 
asymmetries that may alone favor one agent over the other. For this reason, we 
give both players identical kinematic abilities: simple motion with equal veloci- 
ties. The choice to make our evader blind appears to contradict the care taken to 
give the agents an equal footing. To understand this choice, let us consider what 
happens if the evader has the ability to see. Further, let us assume the predictor 
is strictly trying to minimize distance, such that the game is clearly zero-sum. 

If the sightedness of both players is common knowledge [18, 14], i.c., known 
by both players, then this modified game clearly calls for a minimax strategy 
[15] to be used. The minimax strategy calls for the pursuer to move directly 
towards the evader and the evader to move directly away from the pursuer. Any 
divergence from this strategy by an agent gives an advantage to its opponent; 




82 



if both agents observe their optimal strategies, then the distance between them 
does not change. Given our kinematic constraints, minimax allows neither the 
evader to increase its distance from the pursuer, nor the pursuer to near the 
evader, let alone tag it — a very dull game, indeed. This is why agent kinematics 
are made asymmetrical in most experimental work; the possibility of “cognitive” 
asymmetry is thus marginalized. 

We can now appreciate what happens if the evader is blind. Because the 
evader is unaware of the pursuer’s location, it can not simply move away. Instead, 
the evader must employ a mixed strategy that picks moves probabilistically. If 
the pursuer knows this to be the case, through common knowledge, then the 
pursuer is best advised to adopt a statistical decision process [27] based upon 
observations of the evader’s behavior. If the pursuer’s goal is not only to decrease 
the distance between itself and the evader, but also to do so as quickly as possible 
(in order to maximize the number of tags), then pursuit based upon statistical 
observation will, on average, very likely outperform (and certainly do no worse 
than) the simple pursuit strategy that minimax specifies above, provided that 
the statistical observations are suitably accurate. If the evader knows, again 
through common knowledge, that the pursuer is using such a strategy, then it is 
no less motivated to behave probabilistically. 

In his discussion of pursuit games with incomplete information, Isaacs [15] 
explains that the importance of accurate prediction diminishes as the distance 
between opponents increases; if the “probability cloud” of possible future evader 
locations is small compared to the distance between the evader and the pursuer, 
then the cloud is reasonably treated as a single point by the pursuer — obviating 
the need to predict and allowing the pursuer to revert to the simple strategy of 
moving directly towards the evader. On the other hand, if the two agents are very 
close, particularly if their regions of possible future locations overlap, accurate 
prediction is of utmost importance. With spatial proximity, the statistical nature 
of the game becomes more prominent. For this reason, we choose to begin our 
game with the pursuer and evader on top of one another. 



3 Observation and Statistical Reasoning 

Because the evader moves at a constant rate, and commits to some direction 
of travel for the duration of each time-step, the pursuer need only observe the 
sequence of direction choices that the evader makes; if the pursuer accurately 
models the evader’s decision process, then effective pursuit is possible. Though 
the directions in which the agents move (and the locations they occupy) vary 
continuously, the pursuer’s sensing of movement has limited resolution: each 
move of the evader is perceived by the pursuer as being in one of eight directions, 
as showir in Figure 1. Thus, the pursuer’s statistics model the symbol string 
generated by the evader’s behavior, and not the evader’s behavior directly. 

To predict the future path of the evader, the pursuer uses its statistical model 
to generate the symbol string it expects the evader to produce in the coming 
time-steps. The pursuer then uses this symbol string to project the expected 
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trajectory relative to the evader’s actual current location. The pursuer then acts 
according to its policy, detailed below. Even if the symbol string is predicted 
with complete accuracy, the projected trajectory only approximates the path 
the evader will actually take, due to the pursuer’s limited ability to sense and 
represent heading. Nevertheless, the eight-symbol resolution is high enough for 
adequate approximation in practice. 

In its capacity as model-builder, our hand-built pursuer uses techniques that 
we, as experimenters, can use to quantitatively describe an evader’s behavior. 
Our analytic approach comes from information theory [25]. The model of an 
evader is built by computing -order statistics from observations of behavior. 
The order of the statistic refers to the amount of conditioning applied when 
computing the probabilities of various symbols occurring. The higher the order 
of our statistics, the more we can refine our expectations of what will happen 
in the future based on what we have observed further in the past. For example, 
a 0-order statistic states the probability of a symbol occurring, p{S), without 
conditioning — based strictly on the number of times it is observed; if 10% of 
the symbols observed are the symbol ‘X’ then the 0-order probability of ‘X’ is 
p{X) = 0.1. A 1-order statistic states the probability of a symbol conditioned 
with respect to the symbol that immediately precedes it; if the symbol ‘Y’ is 
followed by ‘X’ 60% of the time, then one of the 1-order probabilities of ‘X’ is 
p{X I F) = 0.6. We now know more about when to expect the symbol ‘X’. 

The entropy of an observed sequence describes the over-all predictability of 
the sequence, given a particular order statistic; higher entropy values indicate 
less predictability. Certain symbol sequences require higher order statistics to be 
taken than other sequences to achieve the same level of predictability — the same 
entropy. The minimal amount of behavioral history, or conditioning, required to 
maximize predictability is an indication of a sequence’s complexity. Of equal im- 
portance is the degree to which a sequence becomes less predictable as the order 
of the statistic is decreased. These two characterizations of a symbol sequence 
provide a quantifiable description of behavior. 
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Fig. 1. Pursuer sensing of evader movement. Each turn of the evader generates one of 
eight symbols, depending on the angle of the turn relative to evader’s previous heading. 
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4 Experiments 

4.1 Setup 

Evolving Evader Using an enhanced version of GNARL [3], we evolve artificial 
recurrent neural networks to control our evader agent. The GNARL algorithm 
adapts Evolutionary Programming (EP) techniques to the evolution of neural 
networks. Since the evader is blind, there are no inputs to the network; a single 
bias node, which has a constant activation, is used. The network has a single 
real-valued output that indicates the turn angle for the next move. Networks 
are allowed to have as many as 60 hidden units and 400 weights. These limits 
are not known to be optimal in any sense (casual examination of the larger 
evolved networks shows around 50 hidden nodes but only 100 weights). The 
initial population is comprised of random networks that have between 5 to 15 
hidden units and 25 to 100 weights. The population size is 100. The fitness of a 
controller network is equal to the average distance it is able to maintain between 
the pursuer and the evader over the course of a game. 

Hand-Built Pursuer The evader controller networks are evolved against a 
hand-built pursuit strategy that uses statistical modeling of evader behavior. 
This pursuit strategy is controlled by two parameters: 1) the power of the sta- 
tistical model, which is specified by the order statistic, o, and 2) the number of 
time-steps into the future, t, that will be predicted by the model. The more the 
pursuit strategy looks into the future, the more opportunity it has to discover 
and exploit short-cuts to predicted future locations of the evader. But, with 
increased values of t comes increased risk of being misled by poor prediction. 

At each time-step of the game, the pursuit strategy proceeds as follows; 
using the statistical model of order o, predict the evader’s movement from its 
current location for t time-steps into the future; this yields the path, p, of points 
P = {Pi>P 2 , Assuming the predictions to be correct, we know when the 

evader will arrive at each point, pi, in the path, p. If there exist any points to 
which the pursuer, by heading directly to them, can arrive before the evader, 
select the earliest such point in the path. Otherwise, move towards the first point 
on the path, pi . 

The pursuer computes its statistical model of evader behavior tabula rasa for 
each game. Because the model requires a .suitably large sample of behavior to 
give meaningful predictions, we preface each match between the pursuer and an 
evader with a “warm-up” period of ten thousand time-steps during which the 
statistical model of the evader’s ballistic behavior is induced. The length of this 
warm-up period represents a practical compromise between sample quality and 
simulator speed. After the model is built, the players are placed on the origin of 
the plane and the game proper begins, which lasts for one thousand time-steps. 

4.2 Results 

We outline above a statistical formalism with which we can characterize evader 
behavior, and describe how we embed this formalism into our hand-built pursuit 
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strategy. We now evolve evaders against this strategy. By thus inserting our 
statistical formalism into the evolutionary process, however, we lose, ironically, 
the ability to use it as an objective measure of experimental results; while we do 
provide some quantitative results below, we must also pay careful attention to 
the qualitative results for meaningful understanding. Our formalism can more 
properly be used for agent analysis once both the evader and pursuer are made 
to coevolve; this is the subject of future work. 



Pursuer Performance Figure 2 (left) captures a moment during a match. The 
square and the large circle represent the positions of the pursuer and evader at 
time t, respectively. The small circles are the pursuer’s prediction (using 12*^- 
order statistics), made at time t — 1, of the evader’s path for the next twelve time 
steps. We see that the evader is exactly where the pursuer expected it to be at 
time t. The large ‘X’ represents the point on the evader’s path where the pursuer 
expects to most quickly intercept the evader; it is towards this point that the 
pursuer moved directly from time-step t— 1 to time t. Of course, interception will 
occur only if the predictions are correct. The evader’s behavior might instead 
lead the pursuer into a false expectation of future locations — the only method 
by which the evader can open space between itself and the pursuer; this is what 
we see in Figure 2 (right), which shows a much weaker pursuer (using 3’’‘’*-order 
statistics) playing against the same evader at the identical point in the match. 
Clearly, and to its detriment, this pursuer has a much different model of the 
evader and very different expectations of future evader behavior. 




Fig. 2. Pursuit using 12*^-order statistics (left) and 3'^‘^-order statistics (right). 



Evader Behavior Though the evaders are blind, they too must model their 
opponent to succeed. In this case, the model is induced through evolution and 
reflects the settings of the pursuer’s two parameters: the order of pursuer’s sta- 
tistical observations, and the number of time-steps predicted into the future. A 
very wide variety of evader behaviors evolve in our experiments. The trajecto- 
ries of two different evaders, A and B, are shown in Figure 3 (top left and right, 
respectively). Both were evolved in the same run with pursuit parameters set 
to t = 37 and o = 4; evader A is the best of generation 825, while B arrives 
much earlier in the run and is the best of generation 85. Evader A has a highly 
irregular, though smooth, trajectory that contains tight loops, broad and narrow 
turns, and some relatively straight stretches. Evader B’s behavior is extremely 
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regular (in fact periodic) and jagged, and displays two frequencies of movement 
(one superimposed on the other), which gives it a hint of self-similarity. Both 
evaders clearly maintain non-trivial internal state. 

Below each trajectory, we graph the entropy of the trajectory (dotted line), 
the average distance maintained between the evader and the pursuer (solid line), 
and the average tag rate (dash-dot line); these values are given for a pursuer with 
parameter f = 37 and pai’ameter o ranging from 0 to 9. Though the entropy of 
evader B, when measured with 0-order statistics, is actually higher than that 
of evader A (approximately 1.9 for B and 1.3 for A out of a maximum possible 
entropy of 3.0), the entropy of evader A falls only slightly as the order statistic 
is increased, whereas the entropy of evader B drops dramatically. These entropy 
curves reflect what is visibly obvious: evader A is very difficult to predict and 
evader B is very easy. Particularly, once the pursuer models evader B a.s a Qt.h_ 
order process, the regular behavior of the evader becomes evident: the observed 
entropy falls to a mere 0.16, the average distance the evader is able to maintain 
from the pursuer drops to 0.37 body-length units, and the pursuer’s tag rate 
jumps to a very effective 0.98%. Changes in evader A’s performance are evident 
as modeling power is increased, as well, but are not nearly as dramatic. 

These graphs suggest that proteanism — adaptively unpredictable behavior- 
may be envisioned as a continuum; a behavior may be protean relative to a 
weak statistical model but not to a more powerful one. While a high entropy 
value implies a behavior that is difficult to predict, it does not necessarily imply 
an effective evasion strategy; Isaacs recognizes that, in a game of incomplete 
information, a tension may exist between moves that create uncertainty and 
moves that open distance. Good evasion behavior consists of a balanced mix of 
these types of move. Thus, while evader B has a higher 0-order entropy than 
evader A, its behavior does not produce such a mix and poorly leverages the 
uncertainty it generates. 

As a simple control experiment, we modify the pursuer to move directly to- 
wards the evader’s current location (as in minimax); the appropriate minimax 
response, i.e., straight fleeing, is evolved by the evader. This provides further em- 
pirical evidence that evolution is sensitive to the pursuit strategy and creates eva- 
sion behaviors accordingly. Thus, effective evasion is not synonymous with com- 
plex evasion; the complexity of evasion behaviors we see in our main experiment 
reflect the sophistication of our hand-built pursuit strategy. In coevolutionary 
frameworks, an absence of evader “proteanism,” as conceived here, indicates a 
similar lack of pursuer sophistication. 

5 Conclusion 

We introduce a formulation of the pursuer-evader game that emphasizes its infor- 
mational component. We discuss statistical tools that enable rigorous analysis 
of evader behavior and, thereby, allow the construction of a powerful pursuit 
strategy. Evolution against this pursuit strategy results in evasion behaviors of 
considerable sophistication. Agent performance is relatable to quantifiable fea- 
tures of behavior. 
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Fig. 3. Evader trajectories and performance curves. Trajectories begin at (0, 0). 

Future work will replace the hand-built statistical mechanism of the pursuer 
with a coevolving substrate. The generality of our hand-built pursuer is what 
evokes evasion behavior with (often complex) statistical structure. But, unlike 
the blind evaders, pursuers can not operate ballistically— they must respond 
to observed behavior. If coevolving pursuers are not exposed to a wide enough 
diversity of evasion behavior, they will likely evolve pursuit strategies that lack 
generality; these specialized strategies will be suboptimal in the game-theoretic 
sense. Nevertheless, our statistical tools will be of particular use in tracking 
coevolutionary progress [9]. Finally, we will investigate sighted evaders. 
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Abstract. A gene expression system n-BDD (n-output Binary Decision 
Diagram) was proposed in order to investigate co-evolution[5). Although 
the system is suitable for behavior models of agents, it does not include 
crossover. This paper proposes a crossover operation using Bryant’s Ap- 
ply operation[2j. The operation makes an n-BDD probabilistically inherit 
two functions expressed by two n-BDDs. In an experiment the proposed 
method had more than 40% high fitness than the conventional method. 
Moreover, in another environment where carnivores and herbivores are 
co-evolved, we have seen a food chain relation. 



1 Introduction 

Co-evolution evolves two species of which are interactive in an environment. 
The relation between carnivores and herbivores makes typical co-evolutionsiry 
competition or an interactive relationship of food chain in the natural world. 
Therefore researches on co-evolution and food chain models have been gathering 
attention in the field of evolutionary computation and artificial life. 

T.Takashina et al. studied an artificial competitive world and co-evolution [6] 
using a gene expression system based on finite state automata (FSA), which 
is proposed in [3], The efficiency of genetic algorithms depends on the gene 
expression systems. Moriwaki et al.[5] proposed a system using n-BDDs, which 
extend Binary Decision Diagrams (BDDs) to express functions on any finite 
domains. Effectiveness of the method was shown in many applications. 

The method in [5] with n-BDDs does not have a crossover operation, however. 
This paper proposes a crossover operation for n-BDDs and shows that it works 
well in a co-evolutionary model. 

2 A gene expression n-BDD and genetic operations 

BDDs, proposed by Akers(l], have been used in engineering fields, and is graph 
representations of Boolean functions. A BDD has terminal nodes labeled by true 

* Currently working for Seiko Epson Corp. 

** Currently working for NTT Commucationware Corp. 
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(a) mutation (b) insertion (c) deletion 



Fig. 1. Genetic operations for n-BDDs. 

or false, while an n-BDD[5] can have more than two labels and gives a value from 
any set of values which the labels denote. 

In Fig. 1 a circle denotes an input bit and is called a decision node. An output 
value is denoted by a square which is called a terminal node. With input bits an 
output value is calculated as follows: First look at the top decision node X 1 and 
take a left outgoing edge (called a 0-edge) if J*f 1 is 0 or a right edge (a 1-edge) if 
XI is 1. Iterate this for the decision node indexed by the 0-edge or 1-edge until 
an edge indexes a terminal node, which gives an output. Genetic operations [5] 
are defined to operate n-BDDs (see Fig. 1). These operations are restricted not 
to violate the order of decision nodes. Then any loop does not occur. 

3 Crossover based on Bryant’s Apply operation 

Here we give another notation of n-BDDs. A node of an n-BDD is a non-terminal 
node, represented by a triple (variable name, left-node, right-node), or a terminal 
node, represented by a value. An n-BDD is a pair consisting of a set of nodes 
and a node that represents a top node. We equate an n-BDD (and a function / 
represented by it) with its top node (/.top, /o,/i)- 

The procedure Apply developed by Bryant[2] takes two BDDs representing 
functions / and 3, and a binary Boolean operator (o), and produces a reduced 
BDD representing the function fog defined as 

/og(Xi,X2,...,^„) = /(Xl,X2,...,X„)o3(Xi,X2,...,X„). 
Instead of a binary Boolean operator (o), our Apply extended for n-BDD takes a 
binary operator on a given set of values. The Shannon expansion lets us expand 
the above expression as follows. 

fog(Xu--.,Xi,...,X„) = 

I f(Xu...,Xi-i,0,Xi+^,...,Xn)og(Xi,...,Xi-,,0,Xi+i,...,Xn) ;if Ai = 0 , . 

\/(Ai,...,A<_i,l,A:i+i,...,X„)os(Ai,...,A:i-i,l,A:i+i,...,A:„) ;if Ai = 1 

If we note facts that /(Aq, . . . 0, . . . , A„) = /o and /(Aq,... 

...,Ai_i, 1, Ai+i, . . . , A„) = /i, where /.top = Xi, we can re-formalize Equa- 
tion (1) as / o g = {/.top, fo ° go, fi o gi), where we assume that f.top — g.top. 
In the case f.top has an upper order than g.top or f.top is not appeared in g, 
f ° 9 - if -top, fo o g,fi o g), 01 f o g - (g.top, f o go,f o g^) otherwise. The 
algorithm in Fig. 2 obeys this idea. The order of variables should be fixed. If 
we choose the conjunction for (o), Apply calculates a BDD for f A g. For our 
purpose we use a probabilistic operator as (o). For two values a and b, aob 
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If / or 5 is terminal node or f = g then h f o g 
Else if f.top = g.top then h.Q *— fo o g^^ /ii /i o 9i 
if ^0 = then h *— Hq else h *— {f.top^ Ao, hi ) 

Else if rank of f.top is higher than that of g.top then ho *— fo o g,hi *— fi o g 
if ho = hi then h *— ho else h <— (/.fop» /io» ) 

Else if rank of g.top is higher than that of f.top then ho *— f o go, hi *— f o po 
if ho = ^1 then h *— ho else h *— {g.top, ho, h \ ) 

Fig. 2. An algorithm to calculate h{— fog). 

is a or 6 in the probability 0.5. Then fog inherits both behavior of / and 
g probabilistically. That is f o g obeys f for some inputs and g for others. 
This is an expected behavior of crossover operation. The probabilistic (o) acts 
probabilistically only when a BDD of / o p is constructed and is fixed as 'it is 
deterministic when it is used. 

The time complexity of crossover for two n-BDDs and the size of a resulted 
BDD are 0(|/| ■ |g|), where |/| and \g\ are the numbers of nodes in / and g. In 
our experiment, however, the number of nodes did not grow very much. 

4 An experiment with a simple competition problem 

In this section, we compare ability between w-BDDs with crossover and ones 
without it in a simple competition problem, which is introduced in [5]. In an 
environment two artificial animals, a carnivore and a herbivore act. The problem 
is to let the herbivore evolve to overcome a strong carnivore. 

Inputs and outputs of n-BDDs Both animals act according to their own 
n-BDDs. Meanings of input bits are shown in Table 4. The order of the variables 
also obeys this table. A gene outputs a value from the five actions given in 
Table 2. A carnivore does not output runaway. 

Experimental setting The animals act according with their n-BDDs in a 
20x20 array field. Carnivore has a fixed strategy (i.e. a fixed n-BDD) (Fig. 3), 
and tries to capture a herbivore, which is evolved using each of methods. By ob- 
serving evolution of herbivor’s behavior we examine effect of the method. Thirty 
plants are distributed in the field randomly. When a plant is eaten, another one 
is appeared in a random position. Herbivore may die because of hunger or being 
eaten. The number of steps for which the herbivore survived is taken as its fit- 
ness. The method of selection is preserving the five elitist from thirty herbivores 
of each generation. 

Result We conducted 100 times of experiments. Fig. 4 shows the average of the 
maximum fitness of herbivores using both methods. It is clear that the rise of 
the fitness of n-BDDs with crossover is faster than one without it. 



Table 1. Bit assignments. Table 2. Actions. 







walk (W) 


Animal moves to eight 


xo 


hungry 


neighborhoods at random. 


XI 


repletion 


runaway (R) 


Animal moves in the opposite 


X2 


carnivore is visible far 


direction to a carnivore. 


X3 


carnivore is visible near 


HUH 


Animal moves a food, 


X4 


herbivore is visible far 


and eat it if reached. 


X5 


herbivore is visible near 


do-nothing (N) 


Animal does not move. 


X6 


plant is visible far 


approach (A) 


Animal approaches the same 


XT 


plant is visible near 




kind of animal. 
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Fig. 3. The carnivore’s 
fixed strategy 




Fig. 4. Fitness transition of evolving herbivore 
against the fixed carnivore. 



5 A model for co-evolution in a food chain model 

Here we explain another experiment in a food chain model, where many herbi- 
vores and many carnivores exist and are evolvable. Animals can be born. It aims 
to observe co-evolution among competitive animals through food chain. The food 
chain model proposed in [6] and also in [5] does not have crossover, that is, a 
child is generated from only a parent by mutation, insertion or deletion. In our 
model, a child can be generated from two parents by crossover. 

A setting in the model The field is a 150 x 150 array. A herbivore (a carnivore) 
acquires energy by eating plants (herbivores, respectively). Its energy decreases 
in a constant step by every action until it becomes zero. When it becomes zero 
the animal will die and plants will grow around the body. ^ 

Application of the genetic operations in the model The crossover oper- 
ation lets us construct a natural model for reproduction. When an animal has 
enough energy it tries reproduction with another one of its species. When it is 
difficult to find a partner other operations are also possible. 

For reproduction an animal (say mother) looks for a companion as a partner. 
If a partner exists in a 5 x 5 neighborhood, one with the most highest energy 
is selected as a partner (say father). A child will be born by Apply and the 
energy is distributed equally between mother and child. Father and its energy 
are without any changes. If it fails to find a partner, the animal generates a child 
by mutation, insertion or deletion. The partner selection procedure is expected 
to have effect of selection. The mechanism to allow only animals with high energy 
to have children also has the effect. 

Results As the most stable period, Fig. 6 shows an example of a population 
transition of animals and plants. We can observe peaks of each population by 
turns. Every peak of population of herbivores follows a peak of carnivores. This 
shows a food chain caused during the period through evolution. The Lotka- 
Volterra equation [4] expressing population model for predator-prey in an ecosys- 
tem with a food chain, tells that when the abscissa is populations of carnivores 

^ Dying an animal fertilizes the land by working of many living things, and then some 
plants will grow. We simplified this in the model. 
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and the ordinate is that of carnivores, its graph exhibits cyclic dynamics. This 
was observed in our model as shown in Fig. 6. 

6 Discussions 

We defined a crossover operation for n-BDD based on Bryant’s Apply operation 
and confirmed its effectiveness by experiments. It achieved more than 40% high 
fitness than the conventional method in the saturated generations in a simple 
competition problem. Moreover, in the food chain model with crossover opera- 
tion, we have seen a food chain relation. Table 3 shows the average of the steps 
continued until either herbivores or carnivores are wiped out. They are the av- 
erage of 100 trials. We find that a food chain relation in the system using the 
method with crossover continued a longer time than in the conventional method. 
We expect animals to evolve faster because crossover is performed with a partner 
in high energy. This expectation should be confirmed in the future analysis. 

Table 3. Average steps continued until either carnivores or herbivores are wiped out. 





without crossover 


with crossover 


steps 


2230 ±3627 


10371 ±10860 




Fig. 5. Populations of carnivores, herbi- Fig. 6. Populations of carnivores and her- 
vores and plants (a stable period). bivores (from 24000 to 26000 steps). 
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Abstract. Von Neumann’s architecture for self-reproducing, evolvable 
machines is described. Prom this starting point, a number of issues relat- 
ing to self-reproduction and evolution are discussed. A summary is given 
of various arguments which have been put forward regarding the superi- 
ority of genetic reproduction over self-inspection methods. It is argued 
that programs in artificial life platforms such as Tierra reproduce genet- 
ically rather than by self-inspection (as has previously been claimed), 
However, the distinction is blurred because significant parts of the re- 
production process in Tierrein programs are implicitly encoded in the 
Tierran operating system. The desirable features of a structure suitable 
for acting as a seed for an open-ended evolutionary process are discussed. 
It is found that the properties of such a structure axe somewhat differ- 
ent to those of programs in Tierra-like platforms. These analyses suggest 
ways in which the evolvability of individuals in artificial life platforms 
may be improved, and also point to a number of open questions. 



1 Introduction 

In the late 1940s and early 1950s, John von Neumann devoted considerable time 
to the question of how complicated machines could evolve from simple machines.^ 
Specifically, he wished to develop a formal description of a system that could 
support self-reproducing machines which were robust in the sense that they could 
withstand some types of mutation and pass these mutations on to their offspring. 
Such machines could therefore participate in a process of evolution. 

Inspired by Alan Turing’s earlier work on universal computing machines [3], 
von Neumann devised an architecture which could fulfil these requirements. The 
machine he envisaged was composed of three subcomponents [2]: 

1. A general constructive machine, A, which could read a description <^(X) of 
another machine, X, and build an instance of X from this description: 

A -f (f>{X) X (1) 

* This paper is an abbreviated version of certain sections of [1]. 

^ Von Neumann had difficulties in defining precisely what the term ‘complicated’ 
meant. He said “I am not thinking about how involved the object is, but how in- 
volved its purposive operations axe. In this sense, an object is of the highest degree 
of complexity if it can do very difficult and involved things.” [2]. 
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(where + indicates a single machine composed of the components to the left 
and right suitably arranged, and indicates a process of construction.) 

2. A general copying machine, B, which could copy the instruction tape: 

B + <A(X) cj>iX) (2) 

3. A control machine, C, which, when combined with A and B, would first ac- 
tivate B, then A, then link X to <^(X) and cut them loose from (A -|- B -|- C): 

A + B + C + (j>(X) '^X + (f>{X) (3) 

Now, if we choose X to be (A -|- B -|- C), then the end result is: 

A -t B -H C + ?i(A B -F C) — A -F B + C + d>(A -h B -h C) (4) 

This complete machine plus tape, [A + B + C -t- d>(A -h B -I- C)], is therefore 
self-reproducing. From the point of view of the evolvability of this architecture, 
the crucial feature is that we can add the description of an arbitrary additional 
automaton D to the input tape. This gives us: 

A + B + C + cpiA + B + C + D)-^ A + B + C + T> + (j){A + B + C + D) (5) 

Furthermore, notice that if the input tape d>(A -f B -I- C -H D) is mutated in 
such a way that the description of automaton D is changed, but that of A, B 
and C are unaffected (that is, the mutated tape is (p{A -I- B -I- C 4- D')), then 
the result of the construction will be: 

A + B-4C-tD4-«!.(A-l-B-hC-hD) 

A + B + C + D' +(p(A+B + C + D') (6) 

The reproductive capability of the architecture is therefore robust to some 
mutations (specifically, those mutations which only affect the description of D), 
so the machines are able to evolve. Von Neumann pointed out that the action of 
the general copying automaton, B, was the decisive step which gave his archi- 
tecture the capacity for evolving machines of increased complexity, because B 
is able to copy the description of any machine, no matter how complicated [2] 
(p.l21). This ability is clearly demonstrated in Reaction 5 above. 

2 General Issues of Reproduction 

The major focus of this paper is self-reproduction in the specific context of 
evolution. However, before continuing it is useful to briefly consider some more 
general issues relating to reproduction. 

When looking at any sort of reproduction, it is helpful to look at the process 
by which reproduction is accomplished from a number of different perspectives. 
Two important ones are: 
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1. The degree to which the reproduction process is explicitly encoded on the 
configuration being reproduced, rather than being implicit in the physical 
laws of the world. 

2. The number of different configurations that exist, connected by mutational 
pathways, that are capable of reproducing their specific form (i.e. the dis- 
tinction between limited hereditary reproducers and indefinite hereditary re- 
producers). From the point of view of an individual reproducer, this can be 
expressed in terms of the proportion of all possible mutations it may exper- 
ience that will result in the production of distinct, yet viable, reproducers. 

There are a number of points to note about these distinctions. First it should 
be said that (2), in contrast to (1), does not properly relate to individual repro- 
ducers per se, but rather to lineages of reproducers. It is therefore not relevant 
when considering self-reproduction in and of itself, but is an important factor 
when considering the evolutionary potential of a class of reproducers. 

Secondly, the two distinctions are generally independent of each other, al- 
though the more explicitly encoded the reproduction algorithm is, the less likely, 
in general, it is to be an indefinite hereditary reproducer (because of the increased 
chance of mutations disrupting the copying process; see Section 3.3). 

3 Self-Reproduction and Open-Ended Evolution 

I now wish to return to issues of reproduction in the specific context of evolution. 
In this section I will concentrate on a number of these issues in turn. 

3.1 Trivial versus Non- Trivial Reproduction 

Notice that in much of the recent artificial life work with self-reproduction (e.g. 
[4]), the distinction between trivial and non-trivial self- reproduction is perceived 
to be a distinction on the implicit-explicit axis.^ However, from an evolutionary 
point of view, the limited-indefinite heredity axis is clearly the most relevant. 
Indeed, this is exactly what von Neumann himself says: “One of the difficulties 
in defining what one means by self- reproduction is that certain organizations, 
such as growing crystals, are self-reproductive by any naive definition of self- 
reproduction, yet nobody is willing to award them the distinction of being self- 
reproductive. A way around this difficulty is to say that self-reproduction in- 
cludes the ability to undergo inheritable mutations as well as the ability to make 
another organism like the original” [2]. 

Barry McMullin has presented an enlightening discussion on the history of 
the confusion over von Neumann’s work, which he refers to as the ‘von Neumann 
Myth’ (see, for example, Section 4.2.7 in [5]). One result of this confusion has 

^ Prom this point of view, an example of trivial reproduction in a cellular automata 
space would be where the state of a single cell is reproduced in neighbouring cells 
purely due to the CA’s transition rules. 
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been that the majority of subsequent research concerning this issue of trivial self- 
reproduction has concentrated on the implicit-explicit distinction, rather than 
the limited-indefinite heredity distinction. 

Von Neumann’s work on self-reproduction concerned the question of how 
machines might be able to evolve increased complication in order to perform 
increasingly complex tasks. This is why his design for a self-reproducing machine 
had to be capable of universal construction, and why it was designed in such a 
way that it could withstand some kinds of mutations. 

3.2 Genetic Reproduction versus Self-Inspection 

Von Neumann’s architecture was designed specifically to allow for a possible 
increase in complexity and efficiency of machines by evolution. However, even if 
we accept that his design is a solution to this problem, it is by no means the 
only conceivable solution, as von Neumann himself was well aware. In particular, 
he also discussed the possibility of a machine which built a copy of itself by 
actively inspecting its parts, without the need for this design information to 
be duplicated on a tape (i.e. without a ‘genetic’ description). Indeed, systems 
which reproduce by self-inspection have been designed by Laing, e.g. [6], and 
by Ibanez and colleagues [7]. From the point of view of designing artificial life 
systems, we would like to know which of the possible architectures we should 
employ (according to factors such as their relative simplicity, efficiency, etc.). 

Although he certainly did not prove that reproduction by self-inspection 
could not support open-ended evolution, von Neumann did suggest a number of 
reasons why his genetic architecture would be a more powerful and more gen- 
eral design for this purpose. First of all, as mentioned in Section 1, he noted 
that the essential feature which allowed his automata to overcome the otherwise 
seemingly valid rule that machines are necessarily superior (in size and in organ- 
ization) to their output, was that they contained a general copying automaton B, 
which was capable of copying any linear tape [2] (p.l21). Although B is of fixed, 
finite size, it is able to copy a tape of any size. Now, this action of copying a tape 
is essentially reproduction by self-inspection, but this is generally a straightfor- 
ward task for a linear tape. The major problems arise when trying to copy a two- 
or three-dimensional structure by the same method, for example in specifying 
the precise spatial relationships between parts, and in unfolding multidimen- 
sional forms. Von Neumann also pointed out that self-inspection requires that 
we have a representation which is ‘quasi-quiescent’ in the sense that it can be 
read (for the purposes of copying and possibly for interpretation) without being 
essentially disturbed. With a separate genetic description, we only require that 
this description is quasi-quiescent, but copying by self-inspection would require 
that the whole structure to be copied would have this quasi-quiescent property. 
In general, however, most machines would not have this property, nor would 
we want to restrict ourselves to only considering those machines which did. In 
conclusion, von Neumann says: “To sum up, the reason to operate with ‘descrip- 
tions’ . . . instead of the ‘originals’ ... is that the former are quasi-quiescent (i.e. 
unchanging, not in an absolute sense, but for the purposes of the exploration 
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that has to be undertaken), while the latter are live and reactive. In the situ- 
ation in which we are finding ourselves here, the importance of descriptions is 
that they replace the varying and reactive originals by quiescent and (temporar- 
ily) unchanging semantic equivalents and thus permit copying. Copying, as we 
have seen above, is the decisive step which renders self-reproduction (or, more 
generally, reproduction without degeneration in size or level of organization) 
possible” [2] (p. 122-123). 

Prom a biological perspective, Waddington has made the same point. While 
discussing possible reasons for the universal adoption of genetic architectures for 
self-reproduction by biological life, he suggested that the issue “is presumably 
related to the problem [of] how to combine a store which is unreactive enough 
to be reliable, with something which interacts with the environment sufficiently 
actively to be ‘interesting’” [8] (p.ll8). 

McMullin has pointed out that von Neumann’s genetic architecture also ef- 
fectively decouples the geometry of the variational space of the reproducers (i.e. 
the space of the genetic tapes) from the peculiarities of the environment in 
which they exist (i.e. the space of the phenotype) [5] (pp. 191-193). In addition, 
recall from Section 1 that the architecture will accept any tape of the general 
form 0(A -f B -|- C -t- D) . Assuming the the description of D on the tape can 
be separated from the description of A, B and C,® this design guarantees that 
mutations which affect the part of the tape describing automaton D will not 
interfere with the reproductive capacity of the machine. Machines which repro- 
duce by self-inspection would generally not have this localisation property. This 
being the case, we would not always be able to say that there was a particular 
section of the machine which could be disrupted by mutation without interfering 
with the machine’s ability to reproduce. 

It is interesting to ask whether programs in artificial life platforms such as 
Tierra [10] reproduce according to von Neumann’s genetic architecture or rather 
by self-inspection. As the arguments of the previous paragraphs suggest that a 
marked difference exists in the evolutionary potential of these two methods, it 
is an important question, but it has not received much discussion in the literat- 
ure. McMullin argued that these programs are reproducing by self-inspection [5] 
(p.200). Ibanez and colleagues appear to agree [7] (p.574). In contrast, I would 
like to suggest that they can sensibly be analysed in terms of von Neumann’s 
genetic architecture. I do not have adequate room to argue the case fully here, 

® This is not an inherent property of the architecture per se, but von Neumann’s 
analysis of evolvability did assume a ‘compositional’ structure in the language of the 
tape descriptions (see Section 1). His cellular automata model [2], and Pesavento’s 
recent implementation of a very similM design [9], are existence proofs that it is 
possible to build a self-reproducing automaton with such a compositional genetic 
structure. Interestingly, however, despite his design for the cellular automata model, 
von Neumann also argued that “it is better not to use a description of the pieces and 
how they fit together, but rather a description of the consecutive steps to be used in 
building the automaton” [2]. In other words, the information should be in the form 
of a developmental ‘recipe’ rather than a ‘blueprint’. Further discussion of this topic 
can be found in Chapter 7 of [1]. 
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SO the following paragraphs are included merely for the purpose of stimulating 
discussion on the subject. 

Before I begin, I would like to make a couple of general points, which might 
help to reorient the reader to my perspective. Firstly, I believe that the notion of 
a phenotype fundamentally involves interaction with the emuronment (and that 
this is the essential distinction between the notions of phenotype and genotype — 
the latter being an informational concept). When I talk about phenotypes in the 
following, therefore, and specifically when I talk about the automata A, B, C and 
D, I am interested in the role these phenotypic structures play — their function— 
rather than the details of implementation or of how that function is achieved. 
Secondly, note that the terminology commonly used to describe reproducers in 
Tierra-like systems is somewhat different to that used for von Neumann’s work. 
Because of the similarity between Tierra-like operating systems and those of 
standard digital computers, the actions of Tierran reproducers are often referred 
to as computations rather than constructions, even when a reproducer is in the 
process of building a new copy of itself. However, this process of reproduction 
is, of course, central to the Tierra approach, and I believe that this procedure of 
building a copy of a program in a different part of memory is, in all the relevant 
details, a process of construction in just the same way as construction processes 
in von Neumann’s cellular automata model. In the following, also remember 
that von Neumann’s general constructing automaton A is the machinery which 
interprets the tape to produce a new machine (phenotype), and the general 
copying automaton B copies the tape uninterpreted. 

At first sight it might seem that there is no separate genetic description 
of the program in a Tierra-like system. The picture is complicated by the fact 
that the machinery which interprets the program (i.e. automaton A) does not 
reside in the same part of the computer in which the program itself is stored. 
The state information for this machinery — a program’s ‘virtual CPU’ (i.e. the 
instruction pointer, stacks, registers, etc.) — is generally represented in an inde- 
pendent area of memory to the program’s instructions. Furthermore, the actual 
‘interpreting machinery’ of the virtual CPU is encoded in the global operating 
system provided by the platform, and is in this sense implicit in the program’s 
environment. Additionally, the control automaton C, which controls when the 
instructions in the program get executed, is also implicit in the part of the op- 
erating system which governs mechanisms such as how a program’s instruction 
pointer is updated after the execution of each instruction. All that is left to be 
explicitly encoded by the program, therefore, is the copying automaton B, and 
potentially any other arbitrary automaton D. 

Now, the instructions which make up the program exist in an unreactive state 
in the system’s random-access memory. It is only when the control automaton 
C transfers instructions to the interpreting automaton A that they become 
‘active’. Looked at in this way, we can see that it is the behaviour of the program 
(including looping, jumping, etc.) that is the result of automaton A interpreting 
the unreactive genetic description. This behaviour is therefore the equivalent to 
the constructed machine (and the actions it performs — i.e., the phenotype) in 
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von Neumann’s design, and the string of instructions residing in the random- 
access memory (which is normally referred to as the program) is the tape or 
genetic description of this phenotype. It is perhaps easier to see the distinction 
if one considers a parallel program, with multiple processes (with different state 
information) using the same program listing. 

I therefore suggest that a self-reproducing program in a Tierra-like system 
is consistent with von Neumann’s architecture. However, as automata A and C 
are largely implicit in the environment in which the programs reside (the only 
explicit representation being the state information in a program’s virtual CPU), 
and are certainly not encoded by the individual programs, we can see that a 
‘program’, in the sense of a string of instructions in the system’s random-access 
memory, corresponds to the tape 0(B -|- D) in von Neumann’s scheme. 

The situation is complicated not only because the interpretation machinery 
resides partly implicitly in the environment, and partly in a different area of 
memory, but also for (at least) one further reason. I am claiming that the string 
of instructions comprising the ‘program’ in random-access memory should be 
viewed as the genetic tape in a von Neumann style self-reproduction architecture. 
Now, von Neumann pointed out that the process of copying the tape in his 
automaton was essentially itself a process of self-inspection. In this sense, Tierran 
programs do reproduce by self- inspection. However, the overall mechanism for 
reproduction, including the implicit encodings of the interpretation and control 
automata, fits in with von Neumann’s architecture, in which the copying of the 
tape by self-inspection is an integral feature. The major consequence of this is 
that programs in Tierra-like systems should, all else being equal, have similar 
evolutionary potential to von Neumann’s self-reproducing automata, because 
extra instructions can be added to the end of the ‘tape’ and subject to mutations. 
As long as the mutations do not affect that part of the tape which encodes 
the self-reproduction algorithm, they will be inherited without disrupting the 
capacity of the program to reproduce. 

3.3 Implicit versus Explicit Encoding 

The preceding arguments have led us to consider the question of implicit versus 
explicit encoding of automata. However, rather than the general question that 
has been the subject of much debate relating to trivial versus non-trivial re- 
production, here we are interested in rather more specific questions relating to 
von Neumann’s architecture. Now, as we are interested in the evolution of these 
self- reproducing machines, and as the inheritable information of each machine 
(i.e. the part which gets passed on from parent to offspring) is contained on 
the tape (p, I will assume that the tape must be explicitly represented in some 
fashion, otherwise there would be nothing which could evolve. We can now ask 
which parts of the [A -I- B -|- C -f D] architecture are explicitly encoded on the 
tape (f>, and which are implicit in the environment. Of course, even the beha- 
viour of those parts which are represented on the tape will still to some extent 
be encoded in the ‘laws of physics’ of the environment, but I think the analysis 
is nevertheless worthwhile. 
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Considering von Neumann’s architecture for a self-reproducing automaton, it 
is clear that all four subcomponents, A, B, C and D, are very explicitly encoded 
on the tape 0{A -f B -f C -1- D); the environment in which the automaton exists 
implicitly encodes only very low-level actions in the form of the local transition 
rules of individual cells. The analysis of self-reproducing programs in Tierra-like 
systems above suggests that in these systems, B and D are explicitly encoded on 
the tape ^(B + D), but A and C are implicitly encoded in the environment (the 
operating system). Notice that with this design the ‘genetic code’ which maps 
the genotype <p(B + D) to the phenotype [B + D] cannot itself evolve, because 
the interpretation automaton A is not encoded on the tape. 

It is interesting to speculate on what information we might desire to be 
explicitly encoded on a structure which would be suitable for acting as a robust 
initial seed for an open-ended evolutionary process. I will refer to such a structure 
as ‘proto-DNA’. Now, we would like our proto-DNA to be an indefinite hereditary 
replicator if it is to be such a seed. In other words, it should be able to exist in an 
unlimited number of configurations which retain the ability to reproduce. If the 
copying process is encoded on the tape itself, then mutations have the potential 
to disrupt its ability to be reproduced. It would therefore seem desirable that 
the copying automaton B of our proto-DNA be largely implicitly encoded in the 
environment. Note that this would not necessarily prevent a more complicated, 
and possibly more reliable, explicit copying process B'— genetically encoded as 
(p(B') — later evolving from (but still based upon) the simpler implicit process, 
as indeed seems to have happened during biological evolution. 

If the copying procedure for our proto-DNA is implicitly encoded in the 
environment, however, any configuration of proto-DNA would, all else being 
equal, be able to reproduce as well as any other. In other words, there would be 
no basis for preferentially selecting some configurations over others, and therefore 
no basis for an evolutionary process. Specific configurations of proto-DNA must 
therefore have some specific properties that are selectively significant. Models 
of the origin of life commonly presume that these simple phenotypic properties 
were things such as increased stability of the molecule, simple control of the local 
environment, catalytic activity, etc. (e.g. [11], [12], [13]). 

At the initial stages of an evolutionary process, however, w^e would not expect 
there to be mechanisms for explicitly decoding the proto-DNA; in other words, 
the interpretation machinery A is implicit. This means that particular config- 
urations of proto-DNA should have some specific phenotypic properties (such 
as the ability to act as catalysts) which can be determined directly from their 
structure rather than having to be explicitly decoded from the genotype. We 
could therefore regard the proto-DNA as merely meaning that particular 
configurations have particular phenotypes associated with them, which are (a) 
not related to the process of self-reproduction per se, and (b) do not require to 
be decoded by an explicit interpretation automaton A.'* Regarding the kinds of 

'* I am also assuming here that the domain of interaction of these phenotypes is within 
the environment shared by the evolving population (i.e. the phenotypes can act 
upon other biotic or abiotic components in the environment). This is in contrast 
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simple phenotypes that we might wish to be available to our proto-DNA, the 
possibilities seem endless. Graham Cairns-Smith observes: “It is almost too easy 
to imagine possible uses for phenotype structures — because the specification for 
an effective phenotype is so sloppy. A phenotype has to make life easier or less 
dangerous for the genes that (in part) brought it into existence. There are no 
rules laid down as to how this should be done” [12] (p.l06). If more complicated 
phenotypes are to arise later on in the evolutionary process, however, we require 
that the proto-DNA at least has the potential for explicit interpretation ma- 
chinery A' and control machinery C' to become associated with it. This would 
involve some form of specific reaction to subsections of information in the proto- 
DNA, but more work is needed to fully identify how this potential for explicit 
interpretation might be assured. 

4 Discussion 

It has been argued that programs in artificial life platforms such as Tierra con- 
form to von Neumann’s genetic architecture for self-reproducing machines. Spe- 
cifically, the listing of such a program corresponds to the tape ^(B -I- D). In 
contrast, the analysis of the desirable properties of proto-DNA {i.e. a class of 
object capable of acting as a seed for an open-ended evolutionary process) sug- 
gests that such an object would correspond to the tape ^(D) only. The fact that 
the copying process B is explicitly encoded in Tierran programs means that it is 
susceptible to disruption by mutations and perturbations from the environment. 
This is why the interactions between programs in platforms such as this have 
to be restricted — in Tierra, for example, direct interaction between programs is 
restricted to the reading by one program of the instructions of its neighbours. 

It is likely that many of the more interesting ecological and evolutionary 
phenomena in the biosphere arise because organisms are able to interact in 
much richer ways. Most importantly, biological organisms are embedded in a 
material world, and therefore represent useful resources of matter and energy 
for potential use by other organisms. Without such an unrestricted range of 
allowable interactions, and without the sy'stem being grounded on a material 
basis (i.e. where organisms are composed of structural units which are, at their 
lowest level, conserved, and which are in limited supply), it is doubtful whether 
any selection pressure can exist for organisms to evolve properties such as self- 
maintenance. Also, it is only with such a material grounding that ecological 
phenomena such as food webs and trophic levels can be realised. If we wish to 
allow artificial life models the capacity to evolve in these ways, we must model a 
material environment, and allow the individual organisms much more freedom in 
their interactions. However, if we were to model organisms as self-reproduction 
algorithms (as in Tierra) in such an environment, they would prove very brittle, 
because the explicitly-encoded copying process B would be very easily disrupted. 

to models such as genetic algorithms, where the replicators do not directly interact 
with other replicators and selection is determined by an extrinsic fitness function 
(thereby limiting the potential for open-ended evolution). 
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The advantage of the proposed proto-DNA structure, where B is implicit in 
the environment, is that we can relax the restrictions on interactions between self- 
reproducing machines while maintaining the robustness of the individuals. Such a 
system would therefore have the potential for evolving much richer ecologies and 
symbioses. This approach brings with it many new issues which will have to be 
addressed, such as what range of phenotypes should be available to the proto- 
DNA structures, and how to ensure that they have the potential for evolving 
more explicit interpretation machinery to encode more complicated phenotypes. 
The approach therefore involves a shift of focus away from the process of self- 
reproduction per se, towards questions relating to phenotypes, the environment, 
and interactions between organisms. Progress in these directions might give us a 
better insight into the essential requirements for biological evolution, and might 
also allow us to build artificial life models with improved evolutionary potential. 



Acknowledgements 

Thanks to my PhD supervisor, John Hallam, and to the anonymous reviewers, for their 
useful comments and criticisms. Thanks also to EPSRC for providing financial support. 



References 

1. Taylor, T.: From Artificial Evolution to Artificial Life. PhD thesis, Division of 
Informatics, University of Edinburgh (1999) 

2. von Neumann, J.: The Theory of Self-Reproducing Automata. University of Illinois 
Press, Urbana, 111. (1966) 

3. Turing, A.M.: On computable numbers, with an application to the entscheidung- 
sproblem. Proc. London Mathematical Soc., Series 2 42 (1936) 230-265 

4. Langton, C.G.: Self-reproduction in cellular automata. Physica D (1984) 135-144 

5. McMullin, B.: Artificial Knowledge: An Evolutionary Approach. PhD thesis, 
Department of Computer Science, University College Dublin (1992) 

6. Laing, R.: Automaton models of reproduction by self-inspection. Journal of The- 
oretical Biology 66 (1977) 437-456 

7. Ibanez, J., Anabitarte, D., Azpeitia, I., Bcirrera, O., Barrutieta, A., Blanco, H., 
Echarte, F.; Self-inspection based reproduction in cellular automata. In Moran, F., 
Moreno, A., Merelo, J., Chacon, P., eds.: Third European Conference on Artificial 
Life, Springer (1995) 564-576 

8. Waddington, C.: Paradigm for an evolutionary process. In Waddington, C., ed.: 
Towards a Theoretical Biology. Volume 2. Edinburgh Univ. Press (1969) 106-128 

9. Pesavento, U.: An implementation of von Neumann’s self-reproducing machine. 
Artificial Life 2 (1995) 337-354 

10. Ray, T.S.: An approach to the synthesis of life. In Langton, C., Taylor, C., 

Farmer, J., Rasmussen, S., eds.; Artificial Life II. Addison- Wesley, Redwood City, 
CA (1991) 371-408 

11. Eigen, M., Schuster, P.: The hypercycle: A principle of natural self-organization. 
Die Naturwissenschaften 64 (1977) 541-565 

12. Cairns-Smith, A.: Seven Clues to the Origin of Life. Cambridge Univ. Press (1985) 

13. Szathmary, E., Demeter, L.: Croup selection of early replicators and the origin of 
life. Journal of Theoretical Biology 128 (1987) 463-486 




Some Techniques for the Measurement of 
Complexity in Tierra 



Russell K. Standish 



High Performance Computing Support Unit 
University of New South Wales 
Sydney, 2052 
Australia 

R.Standish@unsw.edu.au 

http://parallel.hpc.unsw.edu.au 



Abstract. Recently, Adami and coworkers have been able to measure the 
information content of digited organisms living in their Avida artificicil life 
system. They show that over time, the organisms behave like Maxwell’s 
demon, accreting information (or complexity) as they evolve. In Avida the 
organisms don’t interact with each other, merely reproduce at a particular 
rate (their fitness), and attempt to evaluate an externally given arithmetic 
function in order win bonus fitness points. Measuring the information 
content of a digital orgcuiism is essentially a process of cormting the 
number of genotypes that give rise to the same phenotype. 

Whilst Avidan organisms have a particularly simple phenotype, Tierran 
organisms interact with each other, giving rise to an ecology of pheno- 
types. In this paper, I discuss techniques for compcuing pairs of Tierran 
organisms to determine if they are phenotypicaUy equivalent. I then dis- 
cuss a method for computing 2 in estimate of the number of phenotypicaUy 
equivalent genotypes that is more ciccurate them the “hot site” estimate 
used by Adami’s group. Finally, I report on an experimental emalysis of 
a Tierra run. 



1 Introduction 

The issue of what happens to complexity in an evolving system is of great in- 
terest. In natural (biological) evolution, the naive view is that life started simple, 
and evolved ever more complex life forms oveh time, leading to that pinnacle of 
complexity, homo sapiens. The end points of that process are of course fixed. 
In the beginning, life must be simple. In our present era, there must exist in- 
telligent organisms (namely us) pondering over the mystery of how we came to 
be. So the anthropic principle fixes the present day as having complex lifeforms. 
There is nothing within the Modem Synthesis of Darwinism that implies a steady 
interpolation between these two end points. In fact it is even plausible that more 
complex organisms than us existed in the past, but have since vanished into ob- 
scurity. However, examinations of the fossil record over the Phanerozoic (the last 
550 million years of the Earth’s history) indicate almost no growth in complexity 
by a number of different measures over that period, apart from an initial large 
jump at the Cambrian explosion. [1] 
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The interesting thing is to ask what one might see if looking at another evolu- 
tionary system apart from the one in which we evolved. Would we see any growth 
in complexity at all? Since we don’t have an extra terrestrial biology to observe (a 
few Martian meteorites aside), the only other systems available are Artificial Life 
systems evolving within a digital computer such as Tierra or Avida. The Avida 
group has reported measuring the information content (complexity) of individual 
avidan organisms[2], or rather a lower bound of the organism’s complexity. Their 
results are that this lower bound increases over time for the maximally fit organ- 
ism, thus showing information accumulating as time progresses. One important 
critique of this work, however, is that organisms do not interact directly with 
each other, and in order to prevent evolution stagnating, an externally imposed 
task (eg computing a logical operation) is added to the system. Organisms are 
given “fitness points” depending on how well they perform this task. This heavily 
weights the system in favour for accruing information. 

By contrast, in the Tierra system, the organisms interact with each other, 
providing a rich array of possible (intrinsic) tasks for the organisms to exploit. 
Since this is an evolving ecology with no externally imposed task, the above cri- 
tique does not apply. However, the downside is that determining whether two 
genotypes are phenotypically equivalent is considerably more complex. In some 
work a couple of years ago[3], I studied the phenotypic properties of Tierran 
organisms to build up a picture of the genotype to phenotype landscape. A Tier- 
ran organism’s phenotype can be characterised by a couple of numbers for each 
possible pairwise interaction in the ecology. Multiway interactions are ignored in 
this study, as experience has shown them to be relatively rare. 

2 Complexity of a Digital Organism 

The information content of a string is given by the difference between the maximal 
Shannon entropy of that string (i.e. considering the string to be random, or devoid 
of information), and the entropy given by assuming that the string codes for some 
phenotype p;[2, 4] 

^{g) = H{g)- H(g\p)=e~\ogs 2 ^ ( 1 ) 

where (. is the length of the genotype (in instructions), and N is the number of 
genotypes that give rise to the same phenotype p. The base, 32, refers to the 
number of instructions in the Tierra instruction set. If A « 32^ (ie a completely 
random sequence), then I{g) = 0. Similarly, if iV = 1 (there is only one genetic 
sequence encoding a genotype, or no redundancy), then I[g) = £. 

The most obvious way to compute N is to search all 32^ genotypes for equi- 
valent phenotypes. However, this is an enormous number of strings to check, 
and computationally infeasible. Adami recognised this problem, and took the 
approach of counting the number of volatile sites v (sites that vary amongst 
phenotypic equivalents), and approximating N « 32”. In one sense this is an 
overestimate of N, so they argue that this gives a lower bound to the information 
1(g). In another sense, however, it is not strictly a lower bound. If it turns out 
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that fixing one of the volatile sites to a particular value allows one of the fixed 
sites to vary without altering the phenotype, then this would be not be counted 
in the N . so what we have is really an overestimate of an underestimate. 

The same criticism applies to this work. We can estimate the above men- 
tioned estimate fairly accurately, more precisely we can find the size of the neut- 
ral network[5, 6, 7] connected by one-site neutral mutations to g. However, the 
possibility remains that there are other neutral networks of g that aren’t con- 
nected by single site mutations to g. Probably the most efficient way of finding 
these is by using a genetic algorithm to explore genotype space, i.e. run Tierra 
for a long time to see what it discovers! The way we use this in our experiment 
is to keep a list of neutrally equivalent organisms that Tierra discovers. As we 
explore the neutral network connected to g, we eliminate items from the list that 
we come across. The remaining names on the list can then be used as seeds to 
start the process again. 

In this work, we use two different techniques to measure N . The first is a 
Monte Carlo random sampling technique to estimate the proportion of the 32*' 
strings found by varying the volatile sites. The second technique, which we use 
in conjunction with the Monte Carlo approach mentioned above, is to walk the 
neutral net. The Monte Carlo technique works well when the density of neutral 
variants is fairly high, whereas the latter technique is best on sparse networks. A 
decision on which technique to use for which site is based on estimated densities 
of neutral variants. 

3 Establishing Phenotypic Equivalence 

Equation (4) of [3] presents the dynamical equations of two species of Tierran 
organisms interacting. The precise form of the dynamics is not important here, 
however the phenotype of the organism can be characterised by its interactions 
will all other possible Tierran phenotypes. Since it is impossible to have the 
complete set of all possible Tierran organisms, those organisms generated during 
a run of Tierra are used. Since Tierran organisms coevolve, the most important 
organisms should be contemporaneous with the test organism. The following 
characteristics are saved for each pair of organisms: 

1. The outcome of the tournament. This may be one of the following: 
infertile The test organism never calls the divide instruction, or does not 

produce any recognisable progeny (essentially still born) 
once The organism produces progeny once, but then never repeats the act. 
repeat The organism continuous reproduces the same progeny. For this 
purpose we ignore what is produced first time around, as this will be 
swamped by number latter progeny. 

nonrepeat The organism continuously reproduces, but the progeny is either 
different each time, or the CPU is in a different state each time the divide 
instruction is called - thus can’t be guaranteed to reproduce ad infinitum. 

2. The name of the progeny organism. This is usually identical to the parent, 
but may another type in the case of symbiosis or parasitism. 
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3. The number of timesteps it takes to reach the first divide instruction (cr,j), 
and the time it takes between successive divide steps after that (rjj). 

4. The number of template matching operations made to the opposing organism 
prior to the first divide (fiij) and between successive divides 

Two organisms are neutrally equivalent if they have identical characteristics 
against all Tierran organisms. Once all organisms are paired with each other, we 
can produce a list of phenotypically unique organisms, which provides a smaller 
test list to pit trial mutants against. We may also eliminate some noninteractive 
pairings prior to simulation by trying to see if potential template matches could 
happen between organisms. This still produces a fairly large list of test organisms, 
so it is still computationally expensive. The high degree of parallelism in this 
problem allows it to be attacked in reasonable time on a parallel supercomputer. 

A further refinement may be possible by producing an archetypal list, perhaps 
by ignoring the (/i, i/, r and a) parameters. The idea being that the archetypes 
contain a representative organism from each niche of the ecology, and ignoring 
minor differences such as reproductive rate. This would coarsen the approxima- 
tion a little, but will probably give an acceptable result. At present this idea has 
not been tested. 

4 Interim Results 

Due to the time constraints of producing this paper, the analysis of a reasonable 
length Tierra run has not been completed. At the time of writing, a moderately 
large data set of 1660 organisms was generated from a 24 hour Tierra run. Tierra 
produces most of its diversity during the earliest stage of its running, so it be- 
comes significantly more expensive to produce larger data sets. This data set 
was halved by removing every second organism, and then a phenotypic analysis 
was carried out. This set reduced to 103 distinct phenotypes, which formed the 
test list used for carrying out the complexity analysis. Each of these 103 organ- 
isms were then tested for phenotypic equivalence against their single site nearest 
neighbours. The number of sites on which no mutation resulted in a phenotyp- 
ically equivalent organism (“nonvolatile sites”) is plotted against the time of 
speciation in figure 1 . 

References 

1. McShea, D.W.: Metazoan complexity and evolution: Is there a trend? Evolution 50 
(1996) 477-492 

2. Adami, C.: Introduction to Artificial Life. Springer, New York (1998) 

3. Standish, R.K.: Embryology in Tierra: A study of a genotype to phenotype map. 
Complexity International 4 (1997) http://www.csu.edu.au/ci. 

4 . Layzer, D.: Growth and order in the universe. In Weber, B., Depew, D., Smith, 
J., eds.: Entropy, Information and Evolution. MIT Press, Cambridge, Mass. (1988) 
23-39 




108 




Fig. 1. Non-volatile site count {complexity estimate) for the set of phenotypic Tierran 
species, as a function of speciation time. 



5. KaufFmcin, S.: At Home in the Universe: The Search for Laws of Complexity. Oxford 
UP, New York (1995) 

6. Reidys, C., Kopp, S., Schuster, P.; Evolutionary optimization of biopolymers and 
sequence structure maps. In Lsington, C., Shimohara, K., eds.: Artificial Life V, 
MIT Press (1997) 379 

7. Schuster, P.: Landscapes aind moleculair evolution. Physica D 107 (1997) 351-365 





A Generic Neutral Model for Quantitative 
Comparison of Genotypic Evolutionary Activity 



Andreas Rechtsteiner^ and Mark A. Bedau^ 

^ Systems Science Ph.D. Program, Portland State University 
P.O. Box 751, Portland OR 97202, USA 
andreasQsysc . pdx . edu, http : //www . sysc . pdx . edu/Alif e/aendy . html 
^ Reed College, 3203 SE Woodstock Blvd, Portland OR 97202, USA 
mabQreed . edu, http : //www . reed . edu/ "mab 



Abstract. We use a new general-purpose model of neutral evolution of 
genotypes to make quantitative comparisons of diversity and adaptive 
evolutionary activity as a function of mutation rate among two versions 
of Packard’s Bugs model and their neutral shadows. Comparing diver- 
sity and evolutionary activity of all these models across the mutation rate 
spectrum shows that the generic neutral model may have broad appli- 
cability in discovering quantitative laws Involving adaptive evolutionary 
activity in different evolving systems. 



1 The Need for a Generic Neutral Model 

Adaptive evolution is thought to produce much of the order and functionality 
evident in complex systems [9,7,5], but it is often difficult to distinguish adap- 
tive change from other evolutionary phenomena such as random genetic drift 
and architectural necessity [6, 10], and some even question whether adaptations 
can be objectively identified at all [6]. Recent progress on identifying adaptive 
evolutionary phenomena includes Bedau and Packard’s statistical methods for 
measuring adaptive evolutionary activity. Here, we apply these methods to the 
problem of determining how adaptive evolutionary activity depends on mutation 
rate. Our ultimate aim is to develop methods for objectively identifying and mea- 
suring adaptive evolutionary activity in all evolutionary systems, both natural 
and artificial, so that we can seek universal laws of adaptive evolutionary activ- 
ity. Here, we test such a method, applied at the level of whole genotypes, in the 
context of two simple models of sensory-motor evolution. But the same method 
can be applied at other levels of analysis in other evolutionary systems. The 
ultimate significance of this work comes from the possibility of quantitatively 
comparing evolutionary adaptations across all evolving systems. 

The centerpiece of our method is Bedau and Packard’s evolutionary activity 
statistics. (We also measure system diversity, D, which is simply the number of 
different genotypes present in a system at a given time.) Detailed definitions and 
motivations for evolutionary activity statistics are readily available elsewhere [2, 
1,3,4,13]. Evolutionary activity statistics aim to identify evolutionary innova- 
tions (here, new genotypes) that persist and continue to play a significant role 
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in a system because of their adaptive value. These statistics fall into two broad 
classes: those reflecting evolutionary activity’s extent and those reflecting its in- 
tensity. Here, we attend only to the extent of evolutionary activity, measuring 
it with mean cumulative evolutionary aetivity (sometimes simply called “activi- 
ty”), A, which in the present context is operationalized as the mean age of the 
genotypes present in a system at a given time. So, in this context, the higher 
a system’s mean activity, the higher the mean age of the system’s genotypes, 
which means the greater the continual adaptive success of those genotypes. Intu- 
itively, the extent of evolutionary activity concerns how much adaptive structure 
is present in a system; one might refer to this as the continual adaptive success 
of the system’s components. By contrast, the intensity of evolutionary activity 
reflects the rate at which new adaptive structure is being created. The extent 
and intensity of adaptive evolutionary activity are independent. For example, if 
a population of highly adaptive genotypes persist indefinitely without changing 
and no new genotypes invade the system, then the extent of evolutionary ac- 
tivity is positive and perhaps grow over time, but the intensity of evolutionary 
activity falls to nil. 

To ensure that evolutionary activity statistics reflect the adaptive success of 
the genotypes and not non-adaptive evolutionary forces like chance and neces- 
sity, one must use non-adaptive evolutionary systems called “neutral models” as 
null hypotheses. That is, one must screen off the effects of non-adaptive evolu- 
tionary forces like chance by comparing the evolutionary dynamics observed in 
target evolutionary systems with those observed in analogous neutral models. 
Such neutral analogues have heretofore been constructed by crafting systems that 
“shadow” the target system in all relevant respects except that a shadow geno- 
type’s presence or concentration or longevity cannot be due to the genotype’s 
adaptive significance [3,4]. Since neutral shadows are tailored to target systems, 
they sharply show the target systems’ deviation from the no-adaptation null 
hypothesis. But neutral shadows have significant drawbacks, too, for studying 
a new target system involves constructing and studying a new neutral shadow, 
and it is vexing to make meaningful quantitative comparisons among different 
tailor-made neutral shadows. 

The obvious way to solve these problems is to create a generic neutral 
model — one neutral model that can approximate many different neutral shad- 
ows. The immediate goal of this paper is to define such a generic neutral model 
and test its usefulness for quantifying evolutionary activity across different sys- 
tems. We pursue this goal by comparing the generic neutral model with two 
simple evolutionary systems and their neutral shadows. 

2 The Models 

Packard’s Line and Block Models. The Bugs simulation is a series of models 
originated by Norman Packard [11, 2] and subsequently modified in various ways. 
Packard’s simulation is designed to be a very simple model of the evolution of 
sensory-motor strategies. It consists of agents sensing the resources in their local 
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environment, moving as a function of what they sense, ingesting the resources 
they find, and reproducing or dying as a function of their internal resource levels. 
The model’s spatial structure is a grid of sites with periodic boundary conditions, 
i.e., a toroidal lattice. The resource distributions studied here take two forms: 

Line: a thin continuous strip, one cell in width, that wraps entirely around the 

world, with all the other sites in the world entirely devoid of resources; 
Block: a square block of resources, 15 cells on a side, with all other sites in the 

world entirely devoid of resources. 

In each case, resources are immediately replenished at a site whenever they 
are consumed. The agents constantly extract resources and expend them by 
living and reproducing. Agents ingest all of the resources (if any) found at their 
current location and store them internally. Agents expend resources at each time 
step by “paying” (constant) “existence taxes” and “movement taxes” (variable, 
proportional to distance moved) . If an agent’s internal resource supply drops to 
zero, it dies and disappears from the world. 

Each agent moves each time step as dictated by its genetically encoded 
sensory-motor map: a table of behavior rules of the form if (environment j 
sensed) THEN (do behavior k). An agent receives sensory information about the 
resources (but not the other agents) in the von Neumann neighborhood of five 
sites centered on its present location in the lattice. In the Line world, there are 
exactly 4 detectable local environments: those detected by agents either on the 
resource strip, immediately to the strip’s left or right, or anywhere else. In the 
Block world, there are exactly 14 detectable local environments: those detected 
by agents either just on one of the four edges, or just off one of the four edges, 
or in one of the four corners, or in the middle of the block, or anywhere else. 
Each behavior fc is a jump vector between one and fifteen sites in any one of the 
eight compass directions. Thus, an agent’s genotype, i.e., its sensory-motor map, 
is just a lookup table of sensory-motor rules. But the space in which adaptation 
occurs is fairly large, consisting of 120'* 10® and 120*“* « 10^® distinct possible 

genotypes in the Line and Block worlds, respectively. An agent reproduces (asex- 
ually, without recombination) if its resource reservoir exceeds a certain threshold. 
The parent produces one child, which starts life with half of its parent’s resource 
supply. The child inherits its parent’s sensory-motor map, except that mutations 
may replace the behaviors linked to some sensory states with randomly chosen 
behaviors. 

A given simulation starts with randomly distributed agents containing ran- 
domly chosen sensory-motor strategies. The model contains no a priori fitness 
function, as Packard [11] has emphasized. Agents with maladaptive strategies 
tend to find few resources and thus to die, taking their sensory-motor genes 
with them; by contrast, agents with adaptive strategies tend to find sufficient 
resources to reproduce, spreading their sensory-motor strategies (with some mu- 
tations) through the population. In the Line world, the main adaptations that 
occur are learning how to stay on the resource strip and learning to do so in step 
with the other bugs on the strip (i.e., meshing with the “flock” of other bugs 
on the line). Another, secondary adaptation is optimizing the jump size on the 
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strip (smaller jumps are better); Furthermore, there is a slight adaptive advan- 
tage to learning how to get back on the strip when immediately adjacent to it. In 
the Block world, as in the Line world, one adaptive pressure is to “flock” along 
with the other bugs, so as to minimize the changes of getting bumped into the 
resource desert. But the basic adaptive strategy needed to survive on a resource 
block is to move in a given direction and speed when in the middle of the block 
until you detect the edge of the block, and then to jump back in the opposite di- 
rection into the middle of the block. A subpopulation of bugs following the same 
strategy will form a flow that rolls across the block and reflects off its edge. 
Since all bugs in the Line world must flock in step not to bump each other off 
the resource strip, the fitness landscape in the Line world has relatively narrow 
peaks. By contrast, bugs that reflect different distances off the edge can co-exist, 
and different subpopulations can form along different edges in a resource block, 
in effect filling different niches, so the Block’s fitness landscape allows for more 
diversity and thus contains relatively broad peaks. 

Neutral Shadows for Packard’s Line and Block Models. The crucial 
property of a “neutral shadow” of a model with emergent genotype dynamics is 
that the shadow system’s evolutionary dynamics are like its target model except 
that a shadow genotype’s activity cannot be due to its adaptive significance — 
for it has no adaptive significance. The neutral shadow a Packard Bugs model 
consists of a population of nominal “bugs” with nominal “genotypes.” A shadow 
“bug” has no spatial location and it cannot ingest resources or interact with 
other “bugs.” All it ever does is come into existence, perhaps reproduce (perhaps 
often), and go out of existence; its only properties are its genotype and the times 
of its birth, reproductions (if any), and death. 

Each neutral shadow run corresponds to a specific Line or Block model run. 
The neutral shadow’s birth and death events and mutation rate are directly 
copied from those in the target run. When some creature is born in the target 
run, a shadow parent is chosen at random (with equal probability) from the 
shadow population to reproduce. The new shadow child inherits its parent’s 
genotype unless a mutation gives the child a new genotype. When some creature 
dies in the Line or Block run a “creature” is chosen at random from the shadow 
population and killed. Thus, all selection in the neutral shadow is random. 

The evolutionary dynamics in a neutral shadow is a neutral diffusion in geno- 
type space. Genotypes arise and go extinct, and their concentrations change over 
time, but the genotype dynamics are at best weakly linked to adaptation through 
the birth and death rates determined by adaptation in the Line or Block model. 
When adaptive genotypes are evolving in a Bugs run, one would expect their 
genotype activity levels to be significantly higher than those in the correspond- 
ing neutral shadows. For, although individuals in the Bugs model and its neutral 
shadow have the same birth, reproduction, and death rates, and their mutation 
rates are the same, in the Bugs model natural selection can cull poorly adapted 
genotypes and preserve well adapted genotypes while the selective force in the 
neutral shadow is entirely random. The difference between the activity levels in 
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the Bugs and its neutral shadow shows how much much natural selection affects 
Bugs activity. 

A Generic Neutral Model. The generic model of neutral genotype evo- 
lution consists of a population of individuals that reproduce and die in a fixed 
genotype space. The genotype space is defined by some number of loci at each of 
which some number of alleles are segregating. Parameters that need to be speci- 
fied in the generic neutral model are N, the size of the population of individuals, 
r, the reproduction rate (the number of individuals that die and reproduce per 
time step), I, the number of loci, a, the number of possible alleles per locus, mi, 
the probability that the allele at a given locus will be mutated when an individ- 
ual is born. (The probability that an offspring will have mutation somewhere in 
its genome, i.e., the mutation rate per individual is mj = 1 — (1 — m;)h) The pa- 
rameters together determine the model’s generic behavior. The genotype space 
is a hypercube of dimension I and size (number of possible genotypes) , with 
each location in this space corresponding to a given genotype. The current state 
of the model is given by the distribution of N individuals in genotype space. The 
population wanders through the space stochastically, spreading and clustering 
at random. 

The individuals in the initial population are assigned genotypes at random. 
Time is discrete, and moves forward each time step by iterating the following 
two-step algorithm: (1) r individuals (selected at random, with replacement) 
each produce a child that is genetically identical to itself except for mutations. 
Mutant alleles are chosen at random from the set of possible alleles. (2) r in- 
dividuals (selected at random, without replacement) die and are removed from 
the population and are replaced by the r children produced at step (1). 

This neutral model does not closely correspond to those systems in which 
some of the generic model parameters are variable. E.g., in Tierra [12] the number 
of loci is variable; indeed, it is not clear exactly what to count as a locus in 
Tierra. In addition, population size and reproduction rate vary over time in 
many artificial models of evolution, such as Echo [7] and Packard’s Bugs models. 
Still, the neutral model might apply reasonably well to these systems if the 
relevant neutral model parameters are set to plausible corresponding values. 
For the comparisons here we set N and r to the mean observed value of the 
corresponding parameter in the Bugs model. One of the goals of this study is to 
assess the usefulness of the generic neutral model under such an approximation. 



3 Experimental Methods 

We observed the behavior of the Line and Block models, the Line and Block 
neutral shadows, and the Line and Block generic neutral models across the mu- 
tation rate spectrum (varied on a log scale). All Packard model simulations were 
started with a randomly initialized populations of 500 individuals. We did at 
least 10 runs at every mutation rate in each model. We varied the simulation 
time between 5 x 10® and 5 x 10^ depending on the mutation rate. The transient 
time is longer at lower mutation rates, and we aimed to have simulations that 
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were long enough to minimize variance due simply to simulation time. The pa- 
rameters for the generic neutral model were set to correspond to the Line model 
(four loci and 120 alleles per locus) and the Block model (fourteen loci and 120 
alleles per locus). We determined average population size N and reproduction 
rate r from each Line or Block model run and set corresponding parameter val- 
ues in generic model runs. We dumped 5000 data points in each simulation, so 
the time interval between data dumps varied with run length. We made sure 
data dumping frequencies did not influence our results. In the generic neutral 
model evolutionary activity was calculated continuously, so the exact activity 
value could be recorded in each data dump. But in the Bugs and neutral shadow 
models genotype data was only sampled at each data dump. So, for simplicity, 
we assumed that a genotype that first appeared at a certain time arose immedi- 
ately after the previous data dump. This procedure loses all information about 
short-lived genotypes that arose and went extinct between data dumps, and it 
significantly overestimates the age of short-lived genotypes that appear in only 
a few data dumps. This bias was minimized by using shorter simulation time 
and data dumping intervals for high mutation rates. 

4 Results and Discussion 

The top of Figure 1 shows the time average of the diversity normalized by the 
time average of the population size, for the Line and Block models, and for 
shadow and generic neutral models of each, as a function of mutation rate, mj. 
At very low mutation rates diversity levels off. This diversity floor is an artifact 
of the finite population size. Indefinitely larger populations would indefinitely 
lower this floor. These diversity data at mutation rates above the diversity floor 
show three salient results: First, the dependence of normalized diversity on mu- 
tation rate for all four neutral models is strikingly similar. This provides evidence 
that the generic neutral model is a good approximation of the neutral shadows. 
Second, the normalized diversity of both Line and Block models is strikingly 
lower than that for the neutral models — the expected result of natural selection 
in the Line and Block models versus the random selection in the neutral models. 
Third, the normalized diversity of the Line model is strikingly lower than that 
for the Block model. This can be explained by the different fitness landscapes 
in the Block and Line world. Each fitness landscape has several local peaks, but 
the Block peaks are broader than the Line peaks and broader peaks support a 
more diverse population. 

The bottom of Figure 1 shows the time average of activity, A, as a function 
of mutation rate, mj, for the Line and Block models and for shadow and generic 
neutral models of each. The activity ceiling in the Line and Block models at lower 
mutation rates is partly an artifact of simulation time; a genotype’s observed 
age cannot exceed simulation time and the longest runs we did lasted 5 x 10^ 
time steps. Longer simulations would raise the observed activity values at very 
low mutation rates. Where unaffected by the activity ceiling artifact, the time- 
averaged activity data show three significant results. 
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Fig. 1. Above: Time average of diversity (normalized by dividing by time average of 
average population size), as a function of mutation rate per individual, m<, for the 
Line and Block models, their neutral shadows, and generic neutral models for them. 
Below: Time average of evolutionary activity, A, as a function of mutation rate per 
individual, m,, for the Line and Block models, their neutral shadows, and generic 
neutral model for them, along with lines showing the power laws which approximately 
fit these data (at certain mutation rates). In both graphs, error bars indicate standard 
deviations of time averages computed from at least ten runs per mutation rate. 
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First, activity in all the neutral models is quite similar, especially considering 
that we did not account for differences in population size and reproduction rate in 
the Line and Block worlds. Furthermore, activity’s dependence on mutation rate 
approximately fits a power law of the form A cc where Oneutrai = 1.0 ±0.1 

(all error bounds are standard deviations). This power law can be explained by 
adapting an argument of Kimura [8]. Kimura found that the average time it 
takes for a new mutant gene to reach fixation during neutral evolution, on the 
assumption that genes get substituted one after another and not at the same 
time, can be described by two time scales. The first time scale — the time it takes 
on average for a neutral mutant to spread throughout the population — is pro- 
portional basically to the population size. The second time scale — the time it 
takes on average for such a mutant gene to occur in the population — is propor- 
tional to the number of mutations that occur, which is proportional to 
Kimura’s assumption that genes are substituted one after another corresponds 
in our neutral models to the assumption that genotypes are substituted one after 
another, and this assumption holds when the mutation rate is not too high. So, 
for low mutation rates Kimura’s discussion applies equally well to genotype sub- 
stitution in our neutral models, with Kimura’s new mutant gene corresponding 
to our new mutant genotype. For low enough mutation rates only the second 
time scale is relevant; the other time scale is basically constant (because pop- 
ulation is basically constant) and becomes negligible. So, evolutionary activity, 
which corresponds to the mean lifetime of genotypes, will be proportional to the 
second time scale, i.e., to 

The second significant result in the activity data is that the magnitude of 
activity is significantly higher in both Line and Block than in the neutral models, 
as is the slope of activity’s dependence on mutation rate. Lower neutral model 
activity can be explained by the lack of adaptation in the neutral models. Ran- 
dom selection in neutral models does not preferentially preserve well-adapted 
genotypes but natural selection in the Line and Block models does, so Line and 
Block activity is expected to exceed neutral model activities. 

The third significant activity result is that the magnitude of the activity is 
significantly higher in the Line model than in the Block model, as is the slope 
of activity’s dependence on mutation rate. Although the data are somewhat 
ambiguous, at relatively higher mutation rates the Line and Block activity’s 
dependence on mutation rate might approximately fit power laws with OLine = 
2.2 ± 0.2 and oeiock = 1-4 ± 0.1. Our activity data at lower mutation rates 
are clearly affected by the activity ceiling artifact; we expect that the slope of 
the Line and Block activity will fall to —1 at low enough mutation rates, but 
the activity ceiling prevents us from resolving this here. We expect that the 
differences in the Line and Block fitness landscapes can explain the differences 
in activity magnitude and slope in the Line and Block worlds. Evidently, fewer 
successful genotypes residing on narrow fitness peaks persist significantly longer 
than more genotypes (therefore lower average population size per genotype) 
residing on broader peaks. In addition, on narrow fitness peaks more than on 
broad peaks, the extent of adaptation seems to fall faster as mutation rate rises. 
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Current work includes resolving the activity’s different dependence on mutation 
rate in the Line and Block models. This project is especially engaging because 
we have precise quantitative results needing explanation. 

5 Conclusions 

Comparing diversity and evolutionary activity in Packards’s Line and Block 
worlds and in their shadow and generic neutral models, yields a variety of precise 
and interesting quantitative results. Some are due to the fact that the neutral 
models are devoid of adaptation, others to the different fitness landscapes in the 
Line and Block worlds. Both illustrate the power and promise of using neutral 
models to quantify adaptive evolution in different evolutionary systems. 

The absence of adaptation in the neutral models explains their relatively 
high diversity and low activity, compared with the Line and Block worlds, as 
well as the lower slope with which activity depends on mutation rate in the neu- 
tral models. The differences we observed have three related implications:, they 
confirm the appropriateness of using activity statistics to measure the extent of 
adaptive structure in an evolving system, thereby confirming.the appropriateness 
of using neutral model activity to measure the amount of activity that can be 
attributed to adaptation as opposed to other evolutionary forces like chance and 
necessity, and thereby confirming the importance of the generic neutral model. 
The generic neutral model closely approximates the behavior of different special- 
purpose neutral shadows; dependence of diversity and activity on mutation rate 
is remarkably similar in all of them. Having one simple generic neutral model 
removes the need to make a new neutral shadow for each evolutionary model and 
allows us to study the general properties of neutral models in one fell swoop. To 
be sure, the generic neutral model has so far passed only a preliminary test, and 
its final confirmation can come only if it successfully approximates many more 
neutral shadows. Conducting these further tests is a subject of current work, 
as is discerning the generic neutral model’s typical behavior. The results dis- 
cussed above reveal an important difference between the evolutionary dynamics 
of neutral and adaptive evolutionary systems, and the generic neutral model is 
an excellent tool for discerning and understanding this difference. 

The magnitude and slope of activity’s dependence on mutation rate also re- 
veals the fundamental difference between the Line and Block fitness landscapes. 
This, too, confirms the appropriateness of measuring the extent of adaptive 
structure in a system with activity statistics, thereby underscoring the utility 
of neutral models in general and the generic neutral model in particular. For 
example, we can measure how much the observed activity reflects the force of 
adaptive evolution by appropriately normalizing observed activity against the 
corresponding neutral model, e.g., by dividing observed activity by the corre- 
sponding neutral activity. If we call this fraction by which observed activity 
exceeds neutral (non-adaptive) activity a system’s excess evolutionary activity 
[13], then a power-law dependence of activity on mutation rate (over part of the 
mutation rate spectrum) implies simple power-law dependence of excess activ- 
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ity on mutation rate in the Line and Block models. Such excess activity power 
laws raise intriguing questions: Does excess activity show similar power laws in 
a broad class of evolutionary models? If so, what exactly explains the magnitude 
and exponent in the laws? Answering any of these questions would significantly 
advance our quantitative understanding of adaptive evolution. And a key tool for 
facilitating these precise quantitative comparisons is the generic neutral model. 
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Abstract The concept of biodiversity has received rapidly increasing 
interest in the biosciences during the last decade. Yet, it is unclear and 
disputed how biodiversity should be characterised and measured. We 
compared several biodiversity measures by applying them to data re- 
trieved from the LindEvol-GA model of evolution. A series of LindEvol- 
G A runs with mutation ranges ranging from zero (producing no diversity) 
to one (producing maximal, but biologically meaningless, diversity) was 
analyzed with the measures to be compared. At intermediate mutation 
rates, biologically meaningful diversity can emerge. 

We show that biodiversity measures can be classified according to the 
way in which they respond to these various types of diversity, and we 
discuss some implications of our observation for the design, choice, and 
application of biodiversity measures. 



1 An approach to biodiversity 

During the last ten years, the term "biodiversity" has become widely used in 
the biosciences [1, 2], including Artificial Life [3] as well as in the general public 
to refer to collections of biological entities which coexist in an intricately orches- 
trated, organismic fashion. But despite its widespread usage, there is no satisfac- 
tory scientific definition of biodiversity. In fact, quite a diversity of biodiversity 
definitions and measures have been proposed in the literature [4, 5, 6, 7, 8, 9], 
and some of them are not compatible. 

Clearly, this problem is to some extent due to diverging concepts and objec- 
tives, which sometimes are political or economic in nature rather than scientific. 
However, even in the scientific domain, there are various sources of difficulties. 
Biodiversity is generally agreed to be a phenomenon occurring on multiple levels 
of biological organisation [2, 10, 11]. Unfortunately, biodiversity concepts that 
focus on different levels of biological organisation (e.g. molecular biology, mor- 
phology, ecology, evolutionary biology etc.) axe sometimes difficult to reconcile. 

Even on a given level of biological organisation, quantitative characterisation 
of biodiversity is not trivial. For example, simply counting species is a widely ap- 
plied method for the quantification of biodiversity. However, while the number of 
species in a system is often suitable as an indicator of the system’s biodiversity, it 
is possible to construct systems with equal numbers of species which nonetheless 
cannot be expected to be equal in biodiversity; e.g. a box containing thousand 
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different insect species would be considered to be less "biodiverse" than a box 
containing thousand species forming a small ecosystem of plants, microbes and 
some insects. 

In response to this problem, biodiversity measures which take the evolution- 
ary relations between species into account have been developed [7, 8, 12]. One 
may consider the contribution of distantly related species to biodiversity to be 
greater than the contribution of closely related species (e.g. [12]), or to define 
the conservation of evolutionary history to be the goal of biodiversity protec- 
tion [6]. But according to such concepts, the bulk of biodiversity would reside 
within prokaryotes, because they diversified in evolution long before eukaryotes 
appeared. A tropical rain forest, the standard example for high biodiversity, 
might appear as a "monoculture" of two rather special groups of multicellular 
eukaryotes, namely animals and plants, from such a perspective. 

Thus, both disregarding evolutionary depth as well as naively equating bio- 
diversity with evolutionary depth fail to capture biodiversity adequately. The 
rainforest example implies that it is something like "the right mix" of degrees of 
evolutionary relatedness which characterises biodiversity. It therefore seems that 
biodiversity is a life phenomenon that emerges somewhere between order (close 
relatedness, complete identity in the limit) and chaos (distant relatedness, total 
unrelatedness in the limit). We therefore tested several biodiversity measures to 
see whether they show a maximum at some edge of chaos in an Artificial Life 
model of evolution. 

All Artificial Life models of evolution are imperfect representations of bio- 
logical evolution. Nonetheless, important aspects of biological evolution can be 
captured by models of evolution. Specifically, emergence of complex phenomena 
has been observed in many computer simulations. Therefore, Artificial Life mod- 
els provide a suitable basis for investigating possible links between biodiversity 
and such emergent phenomena. 



2 LindEvol-GA runs with increasing mutation rates 



For the investigation presented here, we used LindEvol-GA [13, 14], a computer 
model of the evolution of plant growth patterns. Plants in LindEvol-GA grow 
in a two-dimensional lattice world in which they compete for space and energy. 
After a vegetation period, a fitness value is assigned to each plant genome based 
on the amount of energy stored in the plant. Because the plants grow together 
in one lattice, this fitness value depends on the interactions of a plant with 
its neighbours. A new generation of genomes is constructed by removing some 
genomes from the population and creating an equal number of copies of genomes 
randomly drawn from the surviving part of the population. All genomes in the 
population are then mutated. The fraction of the population which is removed 
each generation is specified by a control parameter called the selection rate. The 
control parameter governing mutation in the runs presented here is the global 
replacement mutation rate, which is the probability with which the value of a 
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byte in a genome is replaced with a random value in one generation. Insertions 
and deletions were not used in the runs presented here. 

At the start of each time step the effective mutation rate is set to the global 
mutation rate for all genomes. A plant may multiply or divide its individual 
effective mutation rate by 2 at the expense of one energy unit. Repeated modifi- 
cations are possible. They reduce the fitness of the genome, but can increase the 
chance of accurate replication. Reduction of effective mutation rates can thus be 
an evolutionarily stable strategy which evolves in some runs with relatively high 
global mutation rates (see [15] for details). 

We performed a series of LindEvol-GA runs in which the global replacement 
mutation rate rises from 0.0 to 1.0. fVom 0.0 to 0.4, a replacement mutation 
rate increment of 0.01 was used, larger increments were applied above 0.4. The 
selection rate was set to 0.5 in all runs. Mutation rate adaptation was enabled, 
and all other control parameters were chosen as in [15] as well. Since mono- 
parental reproduction is used in our LindEvol-GA runs, the phylogenetic tree 
connecting all genomes can be recorded. Every ten time steps, this phylogenetic 
tree was used to compute the phylogeny-based biodiversity measures described 
below. The initial phase of each run, in which descendants of more than one 
of the randomly created genomes of the start population exist so that multiple 
unconnected phylogenetic trees are present, was excluded from this analysis. 

Both extremes of the global mutation rate constitute neutral controls (cf. 
]16]): With no mutation, all genomes are identical after an initial phase, so the 
survivors are effectively drawn at random during selection. With maximal mu- 
tation, offspring are totally unrelated to their parents, therefore, achieving a 
high fitness value and surviving selection is again a pure chance event. Only 
with intermediate mutation rates, new genotypes and phenotypes which inherit 
information from their predecessors can arise, and thus, information which is 
biologically meaningful (with respect to the artificial biology of LindEvol) accu- 
mulates in the genomes. Assuming that biodiversity is constituted by collections 
of entities which are different in a biologically meaningful way, one thus expects 
that biodiversity measures should yield higher values for runs wdth intermediate 
mutation rates than for those with no or maximal mutation. 



3 Biodiversity measures 

Most measures which we tested in our analysis were proposed by Williams et 
ill. [7]. These measures evaluate the topology of a rooted phylogenetic tree; the 
lengths of the edges in the tree are not taken into account. All measures are 
based on the follow'ing quantities: 

- n is the number of terminal nodes in the phylogenetic tree. 

- pj is the number of nodes in the path from the terminal node j to the root 
node of the phylogenetic tree. 

- Sjk is the number of nodes which are shared among the paths from node j 
to the root and from node k to the root. 
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- Ujk is the number of nodes which are on the path from node j to the root 
but not on the path from node k to the root. 

On the basis of these quantities, Williams et al. define the following diver- 
gence measures: 



div{j,k) 
dv{j, k) 
dviU, k) 



-Sjk 

"h 1 

Uji,+Uki+1 
'ISjk+Ujk + Ulej 

®jfc 
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Williams et al. introduce four biodiversity measures which are derived from 
these divergence measures by 



nix '■= n ■ mean(d,x) (2) 

where x is III, IV, V and VI, respectively. Further measures used in [7] are the 
plain number of species (mj := n), root w’eight biodiversity, defined as 

m„ “ ^ (3) 

P] 

and finally, dispersion diversity, defined as 

mvn It ■ (mean(dtv) - std.dev.(div)) (4) 



The roman indices for these diversity measures were chosen to match those used 
in [7]. We determined measures II through VII for these phylogenetic trees. 

An alternative approach to measuring biodiversity was proposed by Nee and 
May [6] , who suggested to use the length of a phylogenetic tree as a biodiversity 
measure. The tree length is defined as the sum of the lengths of all edges in 
the phylogenetic tree. Evidently, this measure differs from those developed by 
Williams et al. in that edge lengths are taken into account. Edge lengths in the 
trees retrieved from LindEvol-GA simulations are given in generations. 

As a third approach, we used distance distribution complexity (DDC) [13] 
as a biodiversity measure. DDC is defined as the shannon entropy of the dis- 
tribution of (discrete or discretized) distance values. In [13], the edit distance 
betw'een genomes was used for DDC calculation. In this paper, we also calcu- 
late DDC on the basis of phylogenetic distances. The phylogenetic distance of 
two terminal nodes is defined as the sum of the lengths of the edges on the 
path connecting these nodes. This distance is computed from the trees retrieved 
from the LindEvol-CA runs. We denote edit distance based DDC by Cgdu and 
phylogenetic distance based DDC by Cphyt ■ 

Finally, we also combined the concepts of Williams et al. and edit distance 
into two new measures. One is a combination of mean distance diversity (equa- 
tion 2) and edit distance: 



m.e = n • mean(edit distance) 



(5) 
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while the other one was constructed from the distance dispersion approach (equa- 
tion 4) and edit distance: 

rriedisp '■= n ' (mean(edit distance) — std. dev. (edit distance)) (6) 

4 Results 

Fig. 1 shows the time averages of all biodiversity measures which are based on 
the evaluation of the phylogenetic trees retrieved from the runs. Quite strikingly, 
none of these measures exhibits any significant response to the global mutation 
rate^. This is true for the measures which operate on tree topology only (mea- 
sures II to VII) as well as for those which take edge length into account (tree 
length and Cphyi). 

The results from the measures which are based on genetic distances (i.e. the 
edit distance between genomes) are shown in Fig. 2. The number of different 
genomes is also included as an approximation to the number of species, which 
is often used to quantify biodiversity. In strong contrast to the graphs in Fig. 

1, the quantities shown in Fig. 2 all are pronouncedly correlated to the global 
mutation rate. 

There are two runs (those with global mutation rates of 0.18 and 0.37) which 
deviate very significantly from the general trends in the graphs shown in Fig. 

2. In these runs, mutation rate adaptation evolves. For the global mutation rate 
0.18, this effect is strongest. Data from this run are shown in Fig. 3. Shortly 
after time step 1000, a pronounced transition to mutation rate adaptation is 
indicated by a sharp drop of the average mutation modification exponent. This 
transition has marked effects on average fitness, Cgdu and rUe (left panel of Fig. 
3). On the other hand, no sign of this transition is visible in the time series of 
various phylogeny-based diversity measures (right panel). 

Given the expectation that biodiversity measures should respond to ran- 
domization, these observations provide a strong indication that biodiversity can 
be measured much more adequately on the basis of genetic distances (or other 
pairwise distances) than on the basis of evaluating phylogenetic trees. While it 
cannot be ruled out that other phylogeny-based measures would exhibit a sig- 
nificant response to the mutation rate, it is clear that none of the measures we 
tested does so. 

In Fig. 2, two modes of response to mutation rate can clearly be distinguished: 
While the number of different genomes, rUe and meciist rise monotonically with 
the mutation rate, Cgdit steeply rises from 0 in the run without mutation to 
a maximum at low nonzero mutation rates and decays with further growth of 
the mutation rate, as described before [13]. Thus, mean edit distance (equation 
5) turns out to be a traditional complexity measure (cf. [17]), like the number 
of different genomes. The example of Cgdit illustrates, however, that genetic 
distances are also suitable for calculating alternative complexity measnres. 

^ The variation seen in mean tree length (Fig. Ig) is mainly correlated to the num- 
ber of trees that were analyzed, which varies as the initial phase in which multiple 
independent phylogenies may have different lengths in different runs. 




tree length 




Figurel. Biodiversity measures calculated on the basis of phylogenetic trees. The time 
averages computed over the entire runs are shown as a function of the global replace- 
ment mutation rate setting used in the run. Error bars indicate standard deviations. 
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Figure2. Number of different genomes and biodiversity measures calculated on the 
basis of genetic distances. Error bars indicate standard deviations (very narrow in 
these graphs). 



5 Conclusions 

As a result of our investigation, biodiversity measures can be classified into three 
types, according to their response to increasing mutation rates: 

1. Measures which are insensitive to randomization. 

2. Measures which monotonously grow with increasing mutation rates. 

3. Measures which yield low values at extremely low and high mutation rates, 
and elevated values in between. 

While the second and the third type are well known in Artificial Life [18, 19] 
and in complex systems, the first type comes as a surprise and raises some ques- 
tions. One question is: Why do the evolutionary transitions not leave any traces 
in the phylogeny-based measures? Our answer to this question is that the bare 
phylogeny is not an useful source of information about biodiversity, at least 
in LindEvol-GA. A phylogeny arises even in the control runs, where no diver- 
sity is generated (zero mutation control) or no biologically meaningful diversity 
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0.18. 



emerges (maximum mutation control). Phylogenies that emerge by evolution 
which is governed by biological semantics either are not signficiantly different 
from random phylogenies, or the biodiversity measures we tested fail to respond 
to such differences. In contrast to this, signatures of the evolution of biologically 
meaningful information, i.e. of biosemiosis [20], can readily be detected in the 
genetic distances between the genomes in LindEvol-GA. It may thus turn out 
that phylogenetic diversity, as quantified by the measures we investigated, is less 
suitable as an index for biodiversity than genetic diversity. 

While the phylogeny based measures have not produced significantly different 
values for random and biologically meaningful phylogenies from LindEvol-GA, 
these measures did yield reasonable results in other contexts. Certainly, it is 
possible that these measures may detect aspects of biodiversity which cannot 
be modelled by LindEvol-GA. Another explanation, however, is that there is a 
qualitative difference between the LindEvol-GA data and the other data with 
which the measures were tested. Usually, phylogenies are reconstructed based on 
differences in genes or other biologically meaningful properties. These signatures 
of biological meaning may be partially preseiwed during reconstruction, and the 
measures may respond to this information. In contrast to this, we extracted 
phylogenies from LindEvol-GA independently of differences in genomes or phe- 
notypes, thereby eliminating the information which is relevant for biodiversity. 
According to this line of reasoning, measurement biodiversity should be based 
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Figure4. Scatter plots showing the correlation of Cedit with mean fitnes and the mean 
number of used genes. 



on biological distance data directly; the construction of a phylogeny from these 
data may be an unnecessary step that obscures the biodiversity signal. 

Finally, a question that also remains is whether measures of the second or 
of the third type are more adequate for characterising biodiversity. We think 
that the scatter plots shown in Fig. 4 provide a clue that type 3 measures may 
capture the essential aspects of biodiversity better than type 2 measures. The 
semantics of LindEvol-GA define that achieving high fitness values is meaning- 
ful in that this maximises the chances for successful reproduction. High mean 
fitness values therefore indicate successful accumulation of information which is 
meaningful in LindEvol-GA’s artificial biology. Fig. 4a shows that there is a pro- 
nounced monotonous positive correlation between Cedu and mean fitness, while 
Fig. 4b reveals that the highest values of mg occur in runs where mean fitness 
is low. From Fig. 2c, it can be concluded that the low fitness values are due to 
high mutation rates which prevent the accumulation of significant amounts of 
biologically meaningful information. 

Regarding future directions, we intend to include more diversity measures and 
to use additional Artificial Life models. We argue that correlations to aspects of 
biosemiosis should be considered when new biodiversity measures are designed. 
Assessing such correlations is notoriously difficult for non-artificial systems, but 
the accumulation of molecular data may open new possibilities. For example, 
progress in genomics may soon allow to analyse the correlation between numbers 
of active genes and biodiversity measures, as shown for LindEvol-GA data and 
Cedit in [13]. Molecular data may thus provide a basis for further characterising 
and understanding biodiversity. 
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Abstract. We investigate the evolutionary processes behind the devel- 
opment and optimization of multiple threads of execution in digital or- 
ganisms using the avida platform, a software package that implements 
Darwinian evolution on populations of self-replicating computer pro- 
grams. The system is seeded with a linearly executed ancestor capable 
only of reproducing its own genome, whereais its underlying language 
has the capacity for multiple threads of execution (i.e., simultaneous 
expression of sections of the genome.) We witness the evolution to multi- 
threaded organisms and track the development of distinct expression pat- 
terns. Additionally, we examine both the evolvability of multi-threaded 
organisms and the level of thread differentiation as a function of envi- 
ronmental complexity, and find that differentiation is more pronounced 
in complex environments. 



1 Introduction 

Evolution has traditionally been a formidable subject to study due to its grad- 
ual pace in the natural world. One successful method uses microscopic organisms 
with generational times as short as an hour, but even this approach has diffi- 
culties; it is still impossible to perform measurements without disturbing the 
system, and the time-scales to see significant adaptation remain on the order 
of weeks, at best^. Recently, a new tool has become available to study these 
problems in a computational medium — ^the use of populations of self-replicating 
computer programs. These “digital organisms” are limited in speed only by the 
computers used, with generations in a typical trial taking a few seconds. 

Of course, many differences remain between digital and simple biochemical 
life, and we address one of the critical ones in this paper. In nature, many chem- 
ical reactions and genome expressions occur simultaneously, with a system of 

‘ Populations of E.coli introduced into new environments begin adaptation immedi- 
ately, with significant results apparent in a few weeks [3]. 
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gene regulation guiding their interactions. However, in digital organisms only 
one instruction is executed at a time, implying that no two sections of the pro- 
gram can directly interact. Due to this, an obvious extension is to examine the 
dynamics of adaptation in artificial systems that have the capacity for more 
than one thread of execution (i.e., an independent CPU with its own instruction 
pointer, operating on the same genome). 

Work in this direction began in 1994 with Thearling and Ray using the pro- 
gram tierra [7]. These experiments were initialized with an ancestor that creates 
two threads each copying half of its genome, thereby doubling its replication 
rate. Evolution then produces more threads up to the maximum allowed [11]. 
In subsequent papers [12, 9] this research extended to organisms whose threads 
are not performing identical operations. This is done in an enhanced version 
of the tierra system (“Network Tierra” [8]), in which multiple “islands” of 
digital organisms are processed on real-world machines across the Internet. In 
these later experiments, the organisms exist in a more complex environment in 
which they have the option of seeking other islands on which to place their off- 
spring. The ancestor used for these experiments reproduces while searching for 
better islands using independent threads. Thread differentiation persists only 
when island-jumping is actively beneficial; that is, when a meaningful element 
of complexity is present in the environment. 

In experiments reported on here, we survey the initial emergence of multiple 
threads and study their subsequent divergence in function. We then investigate 
the hypothesis that environmental complexity plays a key role in the pressure 
for the thread execution patterns to differentiate. 

2 Experimental Details 

We use the avida platform to examine the development of multi-threading in 
populations exposed to different environments at distinct levels of complexity, 
comparing them to each other and to controls that lack the capacity for multiple 
threads. 



2.1 The Avida Platform 

Avida is an auto-adaptive genetic system designed for use as a platform in Arti- 
ficial Life research. The avida system comprises a population of self-reproducing 
strings of instructions that adapt to both an intrinsic fitness landscape (self- 
reproduction) and an externally imposed (extrinsic) bonus structure provided 
by the researcher. 

A standard avida organism is a single genome composed of a sequence of 
instructions that are processed as commands to the CPU of a virtual computer. 
This genome is loaded into the memory space of the CPU, and the execution 
of each instruction modifies the state of that CPU. In addition to the memory, 
a virtual CPU has three integer registers, two integer stacks, an input/output 
buffer, and an instruction pointer. In standard avida experiments, an organism’s 
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genome has one of 28 possible instructions at each line. The virtual CPUs are 
Turing-complete, and therefore do not explicitly limit the ability for the popu- 
lation to adapt to its computational world. For more details on avida, see [5]. 

To allow different sections of a program to be executed in parallel, we have 
implemented three new instructions. A new thread of execution is initiated with 
fork-th. This thread has its own registers, instruction pointer, and a single 
stack, all initialized to be identical to the spawning thread. The second stack 
is shared to facilitate communication among threads. Only the new thread will 
execute the instruction immediately following the fork-th; the original will skip 
it enabling the threads to act and adapt independently. If, for example, a jump 
instruction is at this location, it may cause the new thread to execute a differ- 
ent section of the program {segregated differentiation), whereas a mathematical 
operation could modify the outcome of subsequent calculations ( overlapping dif- 
ferentiation). On the other hand, a no-operation instruction at this position 
allows the threads to progress identically {non- differentiated). We have also im- 
plemented kill-th, an instruction that halts the thread executing it, and id-th, 
which places a unique thread identification number in a register, allowing the 
organism to conditionally regulate the execution of its genome. 

We performed experiments on three environments of differing complexity, 
with both the extended instruction set that allows multiple expression patterns 
and the standard instruction set as a control. As individual trials can differ 
extensively in the course of their evolution, each setup was repeated in two 
hundred trials to gain statistical significance. The experiments were performed 
on populations of 3600 digital organisms for 50,000 updates^. Mutations are set 
at a probability of 0.75% for each instruction copied, and a 5% probability for 
an instruction to be inserted or removed in the genome of a new offspring. 

The first environment (I) is the least complex, with no explicit environmen- 
tal factors to affect the evolution of the organisms; that is, the optimization 
of replication rate is the only adaptive pressure on the population. The next 
environment (II), has collections of numbers that the organisms may retrieve 
and manipulate. We can view the successful computation of any of twelve log- 
ical operations that we reward^ as beneficial metabolic chemical reactions, and 
speed-up the virtual CPU accordingly; more complex tasks result in larger speed- 
ups. If the speed increase is more than the time expended to perform the task, 
the new functionality is selected for. The final environment (III) studied is the 
most complex, with 80 logic operations rewarded. 



^ An update represents the execution of an average of 30 instructions per program in 
the population. 50,000 updates equates to approximately 9000 generations and takes 
about 20 hours of execution on a Pentium Pro 200. The data and complete genomes 
are available at http://www.krl.caltech.edu/avida/pubs/ecal99/ . 

^ The completion of a logic operation involves the organism drawing one or more 32- 
bit integers from the environment, computing a bitwise logical function using one or 
more nand instructions, and outputting the result back into the environment. 
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A record is maintained of the development of the population, including 
the genomes of the most abundant organisms. For each trial, these dominant 
genomes are analyzed to produce a time series of thread use and differentiation. 



2.2 DifTerentiation Metrics 

The following measures and indicators keep track of the functional differentiation 
of codes. We keep this initial analysis manageable by setting a maximum of two 
threads available to run simultaneously. The relaxation of this constraint does 
lead to the development of more than tw'o threads with characteristically similar 
interactions. 

Thread Distance measures the spatial divergence of the two instruction 
pointers. This measurement is the average distance (in units of instructions) 
between the execution positions of the individual threads. If this value becomes 
high relative to the length of the genome, it is an indication that the threads are 
segregated, executing different portions of the genome at any one time, w'hereas 
if it is low, they likely move in lock-step (or sightly offset) with nearly identical 
executions. Note, however, that if two instruction pointers execute the code offset 
by a fixed number of instructions, but otherwise identically, the thread distance 
is an inflated measure of differentiation because the temporal offset does not 
translate into differing functionality. 

Code Differentiation distinguishes execution patterns with differing be- 
havior. A count is kept of how often each thread executes each portion of the 
genome. The code differentiation is the fraction of instructions in the genome 
for which these counts differ between threads. Thus, this metric is insensistive 
to the ordering of execution. 

Execution Differentiation is a more rigorous measure than code differ- 
entiation. It uses the same counters, taking into consideration the difference in 
the number of times the threads execute each instruction. Thus, if one thread 
executes a line 5 times and the other executes it 4 times, it would not con- 
tribute as much towards differentiation as an instruction executed ail 9 times by 
one thread, and not at all by the other. This metric totals these differences in 
execution counts at each line and then divides the sum by the total number of 
multi-threaded executions. Thus, if the threads are perfectly synchronized, there 
is zero execution differentiation, and if only one thread exclusively executes each 
line, this metric is maximized at one. An execution differentiation of 0.5 indicates 
that half of the instructions did not have matched executions in each thread. 



3 Evolution of Multi-Threaded Organisms 



For our initial investigations, we focus on the 200 trials in environment III (the 
most complex), with the extended instruction set, allowing for multi-threading. 
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Fig. 1. The time progression of organisms learning to use multiple threads averaged 
over 200 trials. (A) The fraction of trials which thread at all, and (B) The average 
fraction of time organisms spend using both threads at once. The data displayed here 
is for the first 5000 updates of 50,000 update experiments in environment III. 



3.1 Emergence of Multiple Execution Patterns 

Describing a universal course of evolution in any medium is not feasible due to 
the numerous random and contingent factors that play key roles. However, there 
are a number of distinct trends, which will be discussed further. 

Let us first consider the transition of organisms from a purely linear execution 
to the use of multiple threads. In Fig. lA, we see that most populations do 
develop a secondary thread near the beginning of their evolution. Secondary 
threads come into use as soon as they grant any benefit to the organisms. The 
most common way this occurs is by having a fork-th and a kill-th appear 
around a section of code, which the threads thereby move through in lock-step, 
performing computations twice. Multiple completions of a task provide only a 
minor speed bonus, but this is often sufficient to warrant a double execution. 

Once multiple execution has set in, it will be optimized with time. Smaller 
blocks of duplicated code will be expanded, and larger sections will be used 
more productively, sometimes even shrinking to improve efficiency. Once multiple 
threads are in use, differentiation follows. 



3.2 Execution Patterns in Multi-threaded Organisms 

A critical question is “What effect does a secondary thread have on the process 
of evolution?” The primary measure to denote a genome’s level of adaptation to 
an environment is its fitness. The fitness of a digital organism is measured as the 
number of offspring it produces per unit time, normalized to the replication rate 
of the ancestor. In all experiments, the fitness of the dominant genotype starts 
at one and increases as the organisms adapt. Fitness improvements come in two 
forms: the maximization of CPU speed by task completion, and the minimization 
of gestation time. As all tasks must be computed each gestation cycle to maintain 
a reward, this gestation time minimization includes the optimization of tasks 
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Fig. 2. (A) Average fitness as a function of time (in updates) for the 200 environment 
III trials. Most increases to fitness occur as a multiplicative factor, requiring fitness 
to be displayed on a logarithmic scale. (B) Average sequence length for the linear 
execution experiments (Solid line) and the multiple execution experiments (dashed 
line). 



in addition to speed-ups in the replication process. The average progression of 
fitness with time is shown in Fig. 2A for both the niche with the expanded 
instruction set that allows multiple threads, and the standard, linear execution 
niche as a control. 

Contrary to expectations, the niche that has additional threads available 
gives rise to a slower rate of adaptation. However, the average length of the 
genomes (Fig. 2B) reveals that the code for these marginally less fit organisms 
is stored using 40% fewer instructions, indicating a denser encoding. Indeed, the 
very fact that multi-threading develops spontaneously implies that it is bene- 
ficial. How then can a beneficial development be detrimental to an organism’s 
fitness? 

Inspection of evolved genomes has allowed us to determine that this code 
compression is accomplished by overlapping execution patterns that differ in 
their final product. Fig. 3A displays an example genome. The initial thread of 
execution (the inner ring) begins in the D “gene” and proceeds clockwise. The 
execution of D divides the organism when it has a fully developed copy of itself 
ready. This is not the case for this first execution, so the gene fails with no effect 
to the organism. Execution progresses into gene Co where computational tasks 
are performed, increasing the CPU speed. Near the center of Co, a fork-th 
instruction is executed initiating secondary execution (of the same code) at line 
27, giving rise to gene C 2 . The primary thread continues to line 55, the 5 gene, 
where genome size is calculated and the memory for its offspring is allocated. 
Next, the primary instruction pointer runs into gene R, the copy loop, where 
replication occurs. It is executed once for each of the 99 instructions in the 
genome (hence its dark color in the figure). When this process is complete, it 
moves on through gene Iq shuffling numbers around, and re-enters gene D for a 
final division. 
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Fig. 3. A: Execution patterns for an evolved avida genome. The inner ring displays 
instructions executed by the initial thread, and the outer ring by the secondary thread. 
Darker colors indicate more frequent execution. B: Genome structure of the phage 
#X174. The promoter sequence for gene A* is entirely within gene A, causing the 
genes to express the same series of amino acids from the portion overlapped. Genes B, 
E, and K are also entirely contained within others, but with an offset reading frame, 
such that different amino acids are produced. 



During this time, the secondary thread executes gene C 2 computing a few 
basic logical operations. C 2 ends with a jump-f (jump forward) instruction that 
initially fails. Passing through gene /i, numbers are shuffled within the thread 
and the jump at line 72 diverts the execution back to the beginning of the 
organism. Prom this point on, its execution loops through Ci and C 2 for a total of 
10 times, using the results of each pass as inputs to the next, computing different 
tasks each time. Note that for this organism, the secondary thread is never 
involved in replication. Similar overlapping patterns appear in natural organisms, 
particularly viruses. Fig. 3B exhibits a gene map of the phage ^X174 containing 
portions of genetic code that are expressed multiple times, each resulting in a 
distinct protein [13]. Studies of evolution in the overlapping genes of ^X174 
and other viruses have isolated the primary characteristic hampering evolution. 
Multiple encodings in the same portion of a genome necessitate that mutations 
be neutral (or beneficial) in their net effect over all expressions or they are 
selected against. Fewer neutral mutations result in a reduced variation and in 
turn slower adaptation. It has been shown that in both viruses [4] and Avida 
organisms [6] , overlapping expressions have between 50 and 60% of the variation 
of the non-overlapping areas in the same genome, causing genotype space to be 
explored at a slower pace. 

In higher organisms, multiple genes do develop that overlap in a portion 
of their encoding, but are believed to be evolved out through gene duplication 
and specialization, leading to improved efficiency [2]. Unfortunately, viruses and 
avida organisms are both subject to high mutation rates with no error correction 
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Fig. 4. Differentiation measures averaged over all trials for each experiment. (A) 
Thread Distance, (B) Fractional Thread Distance, (C) Code Differentiation, (D) Ex- 
pression Differentiation. Experiments from environment III (solid line), environment II 
(dashed line), and environment I (dotted line) 



abilities. This, in turn, causes a strong pressure to compress the genome, thereby 
minimizing the target for mutations. As this is an immediate advantage, it is 
typically seized, although it leads to a decrease in the adaptive abilities of the 
population in the long term. 



3.3 Environmental Influence on Differentiation 

Now that we have witnessed the development of multiple threads of execution 
in avida, let us examine the impact of environmental complexity on this process. 
Populations in all environments learn to use their secondary thread quite rapidly, 
but show a marked difference in their ability to diverge the threads into distinct 
functions. In Fig 4A, average Thread Distance is displayed for all trials in each 
environment showing a positive correlation between the divergence of threads 
and the complexity of the environment they are evolving in. 

More complex environments provide more information to be stored within 
the organism, promoting longer genomes [1], and possibly biasing this measure. 
To account for this, we consider this average thread distance normalized to the 
length of the organisms, displayed in Fig 4B. When threads fully differentiate, 
they often execute neighboring sections of code, regardless of the length of the 
genome they are in, biasing this measurement in the opposite direction. Longer 
genomes need their threads to be further spatially differentiated in order to 
obtain an equivalent fractional thread distance. Thus, the fact that more com- 
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plex environments give rise to a marginally higher fractional distance is quite 
significant. 

Interestingly, Code Differentiation (Fig 4C) does not firmly distinguish the 
environments, averaging at about 0.5. In fact, the distribution of code differ- 
entiation turns out to be nearly uniform. This indicates that the portion of 
the genomes that are involved with the differentiated threads are similarly dis- 
tributed between complexity levels. Execution Differentiation (the measure of 
the fraction of executions that occurred differently between threads, shown in 
Fig 4D), however, once again positively correlates environments with thread di- 
vergence. The degree of differentiation between the execution patterns is much 
more pronounced in the more complex environments. 



4 Conclusions 

We have witnessed the development and differentiation of multi-threading in dig- 
ital organisms, and exhibited the role of environmental complexity in promoting 
this differentiation. Although this is an inherently complex process, the ability 
to examine almost any detail and dynamic within the framework of avida pro- 
vides insight into what we believe are fundamental properties of biological and 
computational systems. 

The patterns of expression (lock-step, overlapping, and spatial differentia- 
tion) are selected by balancing the “physiological” costs of execution and differ- 
entiation against the implicit effects of mutational load. Clearly, multiple threads 
executing single regions of the genome provides for additional use of that region. 
The benefit is in the form of additional functionality and a reduction in the mu- 
tational load required for that functionality. Within the context of this thinking, 
the correlation between environmental complexity and the usage of multiple 
threads makes a great deal of sense: multiple threads are advantageous only if 
they can provide additional functionality. 

However, we have witnessed the cost side in this equation; when a gene or gene 
product is used in multiple pathways, variations are reduced as the changes to 
each gene must result in a net benefit to the organism. We observed a negative 
correlation between rates of adaptation and use of multiple threads. Further- 
more, the ability to analyze the entropy of each site in the genome quantifies the 
loss in variability predicted by this hypothesis. This entropy analysis has been 
carried out in a biological context by Schneider [10], opening up opportunities 
to verify our results. 

Implications of this work with potentially far reaching consequences for Com- 
puter Science involve the study of how the individual threads interact and what 
techniques the organisms implement to obtain mutually robust operations. The 
internal interactions within computer systems lack the remarkable stability of 
biological systems to a noisy, and often changing environment. Life as we know it 
would never have reached such vast multi-cellularity if every time a single com- 
ponent failed or otherwise acted unexpectedly, the whole organism shut down. 
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Clearly, we are still taking the first steps in developing systems of computer 
programs that interact on similarly robust levels. Here we have performed ex- 
periments on a simple evolutionary system as a step towards deciphering these 
biological principles as applied to digital life. In the future, we plan to add ex- 
plicit costs for multi-threading that depend on the local availability of resources 
for thread execution. Systems at levels of integration anywhere near that of bio- 
logical life are still a long way off, but more concrete concepts such as applying 
principles from gene regulation to develop self-scheduling parallel computers may 
be much closer. 
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Abstract. Since the early beginnings of Evolutionary Computation, Finite State 
Machines (FSMs) have been applied to model organisms. We present a new ap- 
proach to evolve such artificial organisms. The FSMs are subject to a difficult 
navigation and searching task in heterogeneous environments. We give a defini- 
tion of FSM-species and investigate their formation. The results show that spe- 
cies are formed as the organisms agree on a common ‘genetic broadcast lan- 
guage’ and take advantage of the fruitful effects of recombination. As observed 
in natural ecosystems, higher abiotic diversity leads to higher biotic diversity. 



1 Introduction 

The emergence of sex and the conditions of species formation are biological problems 
in which ALife models may help to sharpen the experimental and theoretical discus- 
sion at critical aspects. Here we study species formation by simulating evolutionary 
adaptation of finite automata. In this approach, termed ‘Evolving Finite State Ma- 
chines’ (EFSM), state sets and input alphabets can be varied and are subject to evolu- 
tion. Questions that will be addressed are: 

• Are there optimal cardinalities of state sets or input alphabets? 

• Do stable multi-species-populations emerge or only single-species-populations? 

• How does environmental diversity influence the resulting number of species? 

Two types EAs have hitherto been used to adapt FSMs to perform a given task: 
Evolutionary Programming (EP) and Genetic Algorithms (GAs). The EP-driven ap- 
proach is described in detail in Foge! et al. [61 where the task of the FSM consists in 
predicting symbol sequences. Fogel et al. considered intelligent behavior as a property 
to predict the future environment and to translate this knowledge into goal-directed 
action. As typical features of EP, only mutation and no recombination was used and 
the parameters were not coded in binary format. Mutation was implemented by ran- 
domly replacing a transition, an output symbol or the initial state. Additionally, the 
state set could be enlarged or reduced by one state at random. Hence the number of 
states was not predetermined and became a result of evolutionary adaptation. In [5] 
basically the same approach was applied to the iterated prisoner’s dilemma. 

Jefferson and Collins applied a GA in order to adapt FSMs (cf. [9], [3], [4]). In 
their “Genesys/Tracker” system the FSMs were interpreted as artificial ants which had 
to solve the task of following a given path on a 32x32 grid (the ‘John Muir Trail’). 
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The outstanding features of the GA were multi-point crossover, a large population size 
(65536 individuals) and high selective pressure (only the top 1% or 10% were selected 
for mating). As a consequence of using a GA, the state set had to be fixed. 

In both approaches the question still remains how powerful the algorithms at work 
really are. No comparisons to FSMs designed by humans or to other methods were 
performed. In a few minutes, we were able to manually design FSMs which did as 
well as the evolved ones reported in the publications. But more importantly spontane- 
ous species formation cannot be investigated with these two approaches (cf. section 4). 



2 The Evolving FSMs Approach 

In this section we briefly present our new approach which we termed Evolving Finite 
State Machines. EFSM is a hybrid EA which combines ideas and techniques from 
different standard EAs and makes u.se of the special data structure of FSMs. A FSM 
can be understood as a virtual creature possessing its own artificial intelligence, as it 
has been pointed out in [6]. In EFSM we observe the evolution of high-level goal- 
seeking behavior which will be briefly described in section 3. 



2.1 Data Structure and Representation of the FSM 

The FSM is implemented as a Mealy-type automaton. With the notation of [8] the 
FSM M is given through M=(Q, S, A, 8, X, q„), where Q is the finite state set, S the 
input alphabet, A the output alphabet, 5 the tran.sition function, X the output function 
and q„ the initial state. Let m be the cardinality of the input alphabet, n the cardinality 
of the output alphabet, o the output and q„=0. Then the automaton is fully specified by 
the maps 6: QxL— and X: QxS— >A. This is done through two mxn-matrices, the 
transition table T and the output table O. Whereas T is composed of integers, O con- 
tains bits, integers and real numbers to enable a rich output in order to let the FSMs 
solve difficult tasks. 

Experimental results show the advantages of a direct representation of data struc- 
tures (cf. [1], [10]). Hence, we did not apply binary coding to any of the integer or real 
parameters. The genomes have to be of variable length to allow state and input sets of 
variable cardinality in the population, in contrast to Evolutionary Strategies (ES), EP, 
GAs and (partly) classifier systems (CFSs). EFSM is in this respect similar to Genetic 
Programming (GP). Similar to ES we rely on self-adaptive search to ease the difficult 
task of finding good parameter settings (cf. [1], [7], [10], [11]). The strategy parame- 
ters, e.g. crossover and mutation probabilities, or crossover types, are therefore coded 
within the genome and are subject to evolutionary adaptation. 

In summary, our genome contains a vector of strategy parameters and two matrices 
of variable size, the output and the transition table. 
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2.2 Fitness, Reproduction and Selection 

Unlike other EAs, our approach differentiates reproduction and selection. We define 
reproduction as the event of creating new structures in the sense of “making copies”. 
Selection is the event of “deleting” structures. In EFSM a selective pressure is em- 
ployed in both stages. Reproduction and selection cause superior FSMs to reproduce 
more frequently and to survive longer. The expression of superiority or inferiority is 
the fitness which is derived from the given task. As GAs, EFSM makes optional use of 
different fitness scaling schemes and reproduction schemes as presented in [7]. 

The FSMs reproduced by one of these schemes undergo recombination and muta- 
tion. The new FSMs compete for survival against the parent population during the 
selection phase. Like in ES or EP the worst individuals are deleted until the desired 
population size is reestablished. This combination of reproduction and selection is 
similar to “(p-(-X)-selection” for ES, if uniform reproduction is applied in the repro- 
ductive phase. In GA-terms it would be called a steady-state-GA. 



2.3 Recombination 

EFSM relies on recombination, as do GAs, GP and CFSs. Recombination turns out to 
be of special significance in order to investigate species formation. The vector of 
strategy parameters undergoes the traditional single-point-crossover of GAs. For the 
transition and the output table six different recombination types were implemented in 
EFSM. In order to be concise, we do not specify the recombination types here. It 
should be sufficient to mention that the recombination operators try to exploit the 
'semantic’ coherence that may exist in the output and transition tables. More details 
are available at [15]. By respecting the special ‘grammar’ of FSMs we avoid that 
recombination works as a macro-mutation-operator, as itcould in GP (cf. [2]). Instead 
it produces meaningful structures with high probability. 



2.4 Mutation 

In EFSM we have different mutation types for different data types. All boolean pa- 
rameters mutate by simple bit-inversion as in GAs. All integer parameters are mutated 
by choosing a new random value of the feasible range applying a uniform distribution. 
For real values there are two different types of mutation. The first is simply to choose 
a new number from a uniform distribution (same as for integers). This mutation type is 
applied to real-valued strategy parameters. The second type uses a Gaussian distribu- 
tion and is realized by generating a random number which is added to the current 
value of the parameter. This second mutation type is applied to two output parameters 
and is a simpler version of mutation known from EP and ES. 

Further details on the approach can be found at [15]. The approach has been real- 
ized as a MS-Windows-application which can be downloaded together with a com- 
plete user guide. 
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3 The FSM as a Virtual Creature 

As in [6], [9] we view FSMs as ALife organisms. The task providing the fitness crite- 
rion is to navigate in a given heterogeneous 3D-environment. Our ALife organisms 
seek the lowest point in a multi modal surface, subject to a time constraint. This 
choice has various advantages. First of all, navigation and path planning has a long 
tradition since Wilson’s animat [ 14] that resulted in several other examples to compare 
to. Second, we use well-known functions as environments defining optimization 
problems hard to solve numerically. Hence, the performance of EFSM can be evalu- 
ated against traditional optimizing methods as well as against EAs. Third, via visuali- 
zation of the navigation behavior we obtain a direct access to the FSMs’ capabilities. 
This allows an easy phenotypic interpretation even of complicated multi-state FSMs. 

The FSM is considered the ‘brain’ of the creatures termed ‘optimizers’. Each opti- 
mizer is located at one point and has a sensor measuring at a different point of the 
environment. Whilst the environment is quasi-infinite, the sensor of the optimizer has 
only a limited precision of measurement. In each measuring event the sensor generates 
a finite input symbol, hence the measurement precision is limited by the number of 
input symbols. The input is processed by the Mealy-type FSM. An output is created 
depending on the state of the FSM and its input. According to the output, the effectors 
can cause three types of actions: The optimizer may move to the point where the last 
measurement was performed, may turn around a certain angle and may change the 
distance where it measures at. The output causing these actions consists of four pa- 
rameters (one binary and three real). For a machine with 4 states and 4 input symbols 
the output table already consists of 16 bits and 48 real parameters and the transition 
table adds 16 integers to the FSM. For more details about the navigation consult [15J. 

To make the searching task hard for the optimizers, in each generation they are 
placed on different randomly chosen locations in the quasi-infinite, toroidal environ- 
ment. The optimizers don’t have any knowledge about their position. Additionally, 
there are no landmarks on the torus so that orientation becomes non-trivial. Neverthe- 
less, in EFSM high-level optimizers evolve that use fascinating searching techniques. 

In Fig. 1 an example of the phenotype of an optimizer is depicted. It is a 4x4- 
machine obtained after 300 generations. The background illustrates the environment 
which is a 3D- version of Rastrigin’s function (cf [13] as ref). The local minima are 
indicated by darker colors. The global minimum is located at the center. Moves real- 
ized by the optimizer are drawn with black lines and measuring events without moves 
are depicted with light gray lines. Positions of the optimizer are indicated by white 
dots. Starting point was at the right lower comer. After some erroneous moves, the 
optimizer starts to hop from minimum to minimum and reaches the vicinity of the 
global minimum after only five hops. It can realize this hopping because it systemati- 
cally combs the neighborhood of a local minimum at a promising distance. In the 
situation presented, the optimizer performed only 179 measurements. 

Apparently, it is very difficult for humans to design an equally efficient optimizer. 
Even by knowing this phenotype which is a favorable searching technique, it took us 
hours to code a similar minimum-hopping automaton by hand. Yet the human-coded 
one was inferior to those produced by EFSM within a couple of minutes. 
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Fig. 1. An example of the phenotype of an FSM created by EFSM. 



4 Species in Populations of FSMs 

In biology a species is defined through reproductive isolation. In ALife systems surro- 
gate criteria for genetic similarity are frequently employed to distinguish species. 

Here we define genetic similarity for FSMs by the overall structure of the FSMs. 
Individuals with the same genome structure are defined as one species. The structure 
of the transition table T and the output table O is determined by the state set Q, the 
input alphabet S and the output alphabet A. Thus a prerequisite for the existence of 
various species in one population is that Q, E and A are not identical in all individuals. 
In order to observe species formation, the FSMs should additionally have the possibil- 
ity to isolate themselves reproductively from different species (e.g. from FSMs with 
differing Q, E and A) or rejoin during evolution, whenever it might be evolutionary 
advantageous. 

In EFSM we have realized these prerequisites. The population can be initialized 
with diverse state sets Q and diverse input alphabets E. Thus we are able to initialize 
simulations with single- or multi-species-populations which can be composed of arbi- 
trary diversity. The output alphabet remains fixed, because we don’t allow the opti- 
mizers’ sensors to evolve. We also inserted a strategy parameter subject to evolution 
which decides whether the individual accepts to mate with individuals of differing Q 
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and E. After initialization there’s no regulating instance influencing the population’s 
diversity. The number of species obtained is a pure result of simulated evolution. 

As mentioned in section 1, the approaches using EP or GAs as Evolutionary Algo- 
rithm couldn’t investigate species formation. In [6] only one individual was employed 
in the population. Their successors followed this line of work by rejecting recombina- 
tion as a genetic operator even when they used populations with more than one indi- 
vidual. To the contrary, by using a GA the whole population consists of only one spe- 
cies. Crossover rapidly leads to uniform populations. In some non-FSM applications, 
researchers tried to introduce more diversity through different niching techniques [7]. 
The outcome of these simulations is predetermined by the formula applied for fitness- 
sharing. Therefore there can’t be any endogenous species /ormafion when using a GA. 
Additionally, Q, E and A must be uniform in GA-populations, as the genome length is 
fixed. Thus both approaches weren’t able to model species formation. 



5 Experimental Results 

In this section we present experimental results obtained by several simulations per- 
formed with the program EFSM. First we check whether there are superior state sets 
or input alphabets by comparing performance measures of runs with fixed state set Q 
and fixed input alphabet E. This check is necessary because evolution would probably 
lead to this optimum if it existed. In a second set of simulation runs Q and E are sub- 
jected to evolutionary adaptation. The third series of runs investigates the relationship 
between species formation and varying environmental diversity. 



5.1 Are There Optimal State Sets or Optimal Input Alphabets? 

To answer this question, two series of simulation runs with different state sets Q and 
input alphabets E were performed. As the results are much the same in both cases, we 
only show the results for differing Q. In all runs Q was fixed and couldn’t evolve. The 
populations were initialized with a uniform number of 1, 2, 3, 4, 8 or 16 states. The 
number of input symbols was set to 2. Five simulations were done for each of the state 
sets. We used a series of parameter settings which showed to be adequate through 
former extensive testing (population size; 100, offspring per generation: 50, roulette 
wheel reproduction, no fitness scaling, mutation probability 0.04, recombination prob- 
ability 0.4, recombination type: exchange of blocks of the transition and output tables, 
an optimizer can perform 400 measurements per generation). The environment used 
was the so-called Sphere model. As performance parameter we used the ‘Current 
Mean’, which is the average fitness of all 100 optimizers in the current population. 
The fitness value of a single optimizer is the function value of the Sphere model at the 
end point of the optimizer’s navigation. The maximum in the domain is set to 1 and 
the minimum is set to 0. High-performance optimizers obtain fitness values close to 0. 

The results are depicted in Fig. 2. On the left-hand side (Fig. 2. A) we first observe 
the big dispersion in the 30 different runs. Even on a logarithmic scale, we note a fast 
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improvement during the first 50 generations. Later we observe typical characteristics 
of evolution: After periods of evolutionary stagnation, there are sudden unpredictable 
‘inventions’ leading to a fast improvement of ‘Current Mean’. The relatively best 
results were obtained with two-state-populations, but this setting is not significantly 
superior to the other state sets. These five corresponding runs are highlighted in 
Fig. 2.A. The other 25 runs are drawn with gray lines. 

In Fig. 2.B we zoom in to the first 50 generations and compare only the time series 
for 1, 4 and 16 states. On this time scale the number of states has an influence on the 
performance. Larger state sets correspond to a poorer performance during the first 
generations. The results confirm our expectations. State machines with larger state sets 
contain much more parameters that have to be adapted during evolution leading to 
slower improvement at the beginning. Additionally, larger structures are generally less 
robust with regard to mutation and recombination. On the other hand, larger state sets 
allow a greater variety of actions which emerge in later phases. These more complex 
optimizers recover the initial disadvantage in the following generations. 

A B 





Fig. 2. Time series of the performance measure ‘Current Mean’ with varying but fixed (non- 
evolving) state sets. A; Highlight on 2 states, in the background 1, 3, 4, 8 and 16 states for 300 
generations. B: Comparing 1, 4 and 16 states for the first 50 generations. 

We conclude that there do not exist optimal state sets where the optimizers will 
automatically evolve to. To the contrary, it depends on the considered time scale 
which state sets will be ‘superior’. Having examined optimality, we are now ready to 
let the state sets and input alphabets evolve. 



5.2 Species Formation as a Result of Simulated Evolution 

As stated above, the number of states and the input alphabet are not predetermined and 
may be subject to evolution. Taking into account the results of section 5. 1 we might 
expect an evolution towards increasing state sets and input alphabets. To address this 
question, two series of experiments were performed where either the number of states 
was not fixed (and could vary between 2 and 10) or the input alphabets could evolve 
(with cardinalities 2, 4, 6, 8 or 10). We made 19 simulation runs each. In contrast to 
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the parameter settings in section 5.1 all strategy parameters {e.g. mutation and recom- 
bination rates, recombination types) were allowed to evolve. The strategy parameter 
which decides whether the individual accepts to mate with individuals of differing Q 
and E evolved, too. Rastrigin’s function [13] was used as test environment. 

Fig. 3.A shows the evolution of the arithmetic mean of the cardinality of the input 
alphabets of all optimizers (‘Mean No. of Inputs’). In all simulations a uniform input 
alphabet was reached soon and later evolution depended on these early “frozen acci- 
dents”. However, between different runs the cardinalities vary heavily. It seems that 
alphabets with higher cardinality are slightly preferred. During the first 10 to 20 gen- 
erations the mean number of states (‘Mean No. of States’, cf. Fig. 3.B) decreased in 
all runs. This is consistent with the result of section 5.1 that smaller state sets are ad- 
vantageous at the beginning. Very similar to the input alphabets, the number of states 
evolves fast in direction of a uniform value throughout the population. But again, be- 
tween different runs the resulting numbers of states vary heavily. Somehow surpris- 
ingly, the once obtained uniform value doesn’t rise in later phases. 

A B 





Fig. 3. Mean number of input symbols (3. A) and mean number of states (3.B). 

We conclude that in EFSM there is no evolutionary tendency towards a specific 
state set or input alphabet. But there is a strong tendency towards single-species- 
populations (uniform Q and E). After initialization there is no regulating instance 
influencing the population’s diversity implying that this species formation is an en- 
dogenous development of simulated evolution. It seems like the individuals agree on 
certain Q and E as genetic grammar. This grammar may be interpreted as a type of 
‘genetic broadcast language’. It seems that they need the common language as a pre- 
requisite for successful operation of recombination. As the individuals are not forced 
into sexual reproduction, the formation of species is a decentralized group decision. 



5.3 Species Formation as a Function of Environmental Diversity 

In a last series of runs we investigated the relationship between the velocity of .species 
formation and environmental diversity in EFSM. One approach to explain species 
diversity in biological ecosystems is the variety of ecological niches. In EFSM we 
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observe different specialists evolving for the different landscapes. Thus we can simu- 
late more ecological diversity by simultaneously posing more than one landscape to be 
optimized. The new fitness criterion is the sum of the results obtained by the optimiz- 
ers on each landscape. Still the individuals and species compete within one population, 
but now there might be different specialists for different functions. An individual 
might compensate a poor performance in one landscape by a high performance in 
another landscape. 

We executed simulation runs with one landscape (Rastrigin’s function and 
Griewank’s function [13]) and with four landscapes simultaneously (Rastrigin’s, 
Griewank’s, Schwefel’s [12] and a rotated version of Schwefel’s function). The runs 
were initialized with uniform input alphabets (four symbols) but diverse numbers of 
states between two and seven. Consequently, after initialization there were always six 
species in the population. The rest of the parameter settings were the same as in sec- 
tion 5.1. Fig. 4 depicts the time series of the number of species for the three variants. 
The number of species shown is the mean of the corresponding 20 runs. In all three 
variants the number of species decreases rapidly to two. The value ‘Mean No. of Spe- 
cies’ remains higher in the more diverse environment. Thus in EFSM, like in nature, 
higher environmental diversity leads to a higher diversity of species. 




Fig. 4. Number of species for high environmental diversity (four landscapes) and low diversity 
(one landscape). 



6 Conclusions 

We did not observe the appearance of optimal numbers of states or optimal cardinal- 
ities of input alphabets. Smaller state sets are advantageous during early generations. 
We observed a strong selection pressure in direction of uniform state sets and uniform 
input alphabets. This implies that soon after the start of the evolutionary process only 
one or two species remained. This can be explained by the fact that recombination can 
only be meaningful between individuals with similar structure. Therefore successful 
genetic search cannot take place with different grammars. Instead, the organisms have 
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to agree on a common ‘genetic broadcast language’. Seen from a ecological point of 
view, it is also plausible that only one species will survive under one given environ- 
mental condition. Indeed, if we use several static environmental conditions simultane- 
ously we get higher diversity despite the fact that we didn’t change the conditions of 
competition in the population. 

We appreciate the financial support of the BMBF (grant BEO 51-0339476C). 
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Abstract. Grammatical Evolution (GE) is a grammar-based GA which 
generates computer programs. GE has the distinction that its input is a 
BNF, which permits it to generate programs in any language, of arbitrary 
complexity. Part of the power of GE is that it is closer to natural DNA 
than other Evolutionary Algorithms, and thus can benefit from natural 
phenomena such as a separation of search and solution spaces through 
a genotype to phenotype mapping, and a genetic code degeneracy which 
can give rise to silent mutations that have no effect on the phenotype. 

It has previously been shown how runs of GE are competitive with GP, 
and in this paper we analyse the feature of genetic code degeneracy, 
and its implications for genotypic diversity. Results show that genetic 
diversity is improved as a result of degeneracy in the genetic code for the 
problem domains addressed here. 



1 Introduction 

Grammatical Evolution (GE) is a grammar-based, variable length, linear genome 
system which is capable of generating programs or expressions in any language. 
Rather than the functions and terminals associated with GP [3], GE takes a BNF 
specification of a language, or subset thereof, from which it can subsequently 
generate compilable code. The BNF is used to build a program by applying 
production rules to elements of the non-terminal set of the BNF definition, in a 
mapping process to generate the output code from a simple binary string. 

GE has been successfully applied to a number of diverse problem domains 
such as, symbolic regression [9], finding trigonometric identities [10], symbolic 
integration [10], and the Santa Fe trail [7]. The results compared favorably with 
systems such as GP, and has been shown to outperform GP [7] . 

In the spirit of Artificial Life, one definition of which is, 

[the] field of study devoted to understanding life by attempting to abstract the 
fundamental dynamical principles underlying biological phenomenon, and recre- 
ating these dynamics in other physical media, such as computers, making them 
accessible to new kinds of experimental manipulation and testing [5], we have 
developed a system to generate programs which attempts to harness some of 
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the features of the genetic machinery of living organisms which are theorised 
to have an impact on the phenomenon of evolution [1]. This paper focuses on 
genetic code degeneracy, a characteristic of the genetic code in biological organ- 
isms which has been incorporated into GE, in an attempt to analyse the role 
this feature plays in maintaining genotypic diversity. 

2 Grammatical Evolution 

When tackling any problem with GE, a suitable BNF definition must first be 
decided upon. The BNF can be either the specification of an entire language, 
or perhaps more usefully, a subset of a language geared towards the problem at 
hand. 

The genotype is then used to map the start symbol onto terminals by reading 
genes of 8 bits to generate a corresponding integer value, from which an appro- 
priate production rule is selected. A rule is selected by the Integer Gene Value 
MOD the Number of Production Rules for the current non-terminal. 
Considering the following rule, 

(1) <code> : : = <line> (A) 

|<codeXline> (B) 

i.e. given the non-terminal code there are two production rules to select from. 
If we assume the gene being read produces the integer 6, then 6 MOD 2=0 
would select rule (A) < line >. Each time a production rule has to be selected 
to map from a non-terminal, another gene is read, and in this way, the system 
traverses the genome. 

Given an 8 bit binary number, each gene can represent 256 distinct integer 
values. However, many of these integer values can represent the same production 
rule, taking production rule 5 as an example, if the current gene value was 6, 
then 6 MOD 3 = 0 would select rule (A) left{) as shown above. The same rule 
would be chosen if the gene value was 3, 9, 12, etc. 

3 The Problem Spaces 

Two problem domains were examined at which GE was previously found to be 
successful, namely a symbolic regression problem [9], and the Santa Fe ant trail 
[7] . For the Symbolic Regression problem the system is given a set of input and 
output pairs, and must determine the function that maps one onto the other. 
The particular function examined is f{x) = -I- + X with the input 

values in the range [—1.. -I- 1]. To determine the fitness of an individual program 
20 test points were applied in the input range, and the fitness was taken as the 
sum of the error. The objective of the Santa Fe ant trail is to find a computer 
program to control an artificial ant so that it can find all 89 pieces of food located 
on a non-continuous trail within a specified number of time steps. The ant can 
only turn left, right, move forward one square. It may also look ahead one square 
in the direction it’s facing to determine if that square contains a piece of food. 
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4 Results 

Two sets of experiments were carried out for each problem domain. Fifty runs 
were carried out where genetic code degeneracy was removed so far as possible 
by reducing the number of bits in a gene to the lowest possible value that can 
still represent the maximum number of productions rules belonging to any one 
non-terminal. Another fifty runs were carried out where degeneracy was present 
as in the standard GE implementation using 8 bit genes. 

Two measures were used to give an indication of genotypic diversity, the first 
is a measure which we have termed the mean variety. This measure was obtained 
by calculating the average of the variances at each bit locus on the genome. 

With a population size of 500 this meant that the greater the variance in 
a population the closer the mean variety measure is to 0.25. The aim of this 
measure is to attempt to establish how different the individual genotypes in any 
given population are. 

The other measure used was the number of unique individuals in a popula- 
tion, w'hich can also be used to some extent to illustrate the genetic diversity 
within a population[4]. 

Results for both of these measures over both problem domains can be seen 
in Figures 1 2. As can be seen from these graphs the code degeneracy is having 
a marked effect on the mean variety measure, and on the number of unique 
individuals. 

A measure of the average number of invalid individuals in each generation 
over the 50 runs was carried out for both problem domains, and results show that 
when the degeneracy is removed from the genetic code, the number of invalid 
individuals increases significantly over a run when compared to the case where 
a degenerate code exists. 
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Fig. 2. Genetic Code Degeneracy and Unique Individuals 



5 Discussion 

In the standard GE implementation 8 bit genes are used, which can represent 256 
distinct integer values. Therefore, in this state GE can represent 256 productions 
for each non — terminal in the grammar. In the case of the Santa Fe trail the 
maximum number of productions that any one non — terminal has is 3, and 
for symbolic regression problems this number is 4. As a result, the minimum 
number of bits any gene can have in the case of these problems is two, as this 
can represent a maximum of 4 distinct productions. It follows that in the case 
of the symbolic regression problem, all degeneracy has been removed, while it 
still exists to a very small extent in the Santa Fe trail problem. 

Given the number of invalid individuals produced when genetic code degen- 
eracy is removed, it is clear that degeneracy is responsible for the preservation of 
the functionality of the phenotype, while still allowing unrestricted search of the 
genotypic search space. Based on the two measures used to give an indication of 
the genotypic diversity in a population, the results clearly show that degeneracy 
in the genetic code is having a beneficial effect on genotypic diversity in the 
population. 

Kimura’s neutral theory of molecular evolution [2], suggests that molecular 
evolution is as a result largely of mutations that have no effect on the pheno- 
type, i.e. functionality. Kimura mentions that this could also be responsible for 
maintaining genetic diversity within a population. The results produced by GE 
would reflect the basis for this theory within our artificial population, which is 
occurring due to the genetic code degeneracy which is part of the genotype to 
phenotype mapping process. 
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Following on from Kimura, others have noted the benefits that neutral evo- 
lution can bring to the evolutionary process [8] [6]. In particular in [6] it has 
been suggested that the maximum fitness attainable on a fitness landscape with 
a degree of neutrality increases as the degree of neutrality increases. Further 
inve.stigation is required to determine the ramifications, if any, for GE based on 
these and other observations. 

6 Conclusions 

The results presented here show that genetic code degeneracy is having a clear 
beneficial effect on the genotypic diversity, and in the preservation of valid phe- 
notypes, during runs of GE. The benefits of a complex mapping process show 
that the one-to-one genotype to phenotype mapping that is so prevalent in other 
evolutionary algorithms is not necessarily a good idea. We have previously shown 
that by harnessing features from the genetic system of biological organisms that 
the performance of GE can be enhanced, and as such there are clear advantages 
to be gained by using more biologically inspired techniques within evolution- 
ary algorithms. It follows, that by modifying the current genotype to phenotype 
mapping process to one that is even closer to that of biological organisms, we 
may find that the performance of GE will benefitTtran even greater extent. 

Taking on board these findings, and the fact that GE has proven success- 
ful across diverse problem domains, it has been shown that GE is a powerful 
approach to automatically generating programs in any language. 
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Abstract. This report presents an asynchronous, distributed genetic 
programming (GP) system using a master/slave architecture. Using sym- 
bolic regression for fourier functions as the problem domain, the system 
was found to demonstrate cooperative coevolutionary dynamics vrhen 
multiple client populations evolve solutions to similar, but different prob- 
lems: specifically, closely coupled populations were found to promote 
continuous search, which in some cases leads to the discovery of better 
solution.s. 



1 Introduction 

The abilities of various evolutionary algorithms to traverse large search spaces 
have been well documented [5] [7], as well as their amenability to parallel imple- 
mentation [3]. Much work has investigated various parallel architectures that 
partition, or otherwise reorganize a single problem[4][8][ll]; focus is placed on 
the dispersion and synthesis of partial solutions, often referred to as building 
blocks or schemas. 

However, there is little investigation of parallel algorithms that attempt to 
distill useful information from a set of similar, but different problems. It seems 
plausible that populations evolving solutions to similar problems may benefit 
from sharing their results. 

It has been found that sharing is indeed possible, and is correlated to the 
degree of similarity between fitness landscapes. Most importantly, however, we 
have found that two (or more) populations that share solutions can help each 
other escape local optima and explore increasingly better optima. As opposed 
to Hillis’[6] appropriation of parasitic coevolutionary dynamics to guard against 
premature convergence, the formulation presented here uses cooperative coevolu- 
tionary dynamics — in the form of solution transmission between populations — to 
reach continually better solutions. 



2 The Model 

A distributed GP model using a master/slave architecture similar to that de- 
scribed in [9] is used to gather results in this report; clients send their current, 
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most fit solution to a central server, and download solutions generated by other 
clients from that server. Unlike the architecture described in [9], however, all 
solutions received by the server are broadcast to a requesting client, with the 
exception of solutions sent by the same client.^ 

We have chosen the problem domain of symbolic regression with which to 
gather quantitative measurenients of the coevolutionary dynamics of the de- 
scribed model. Problem sets hre defined as parametric fourier functions of the 
form 

5 

Vi = y^^ajsinjjjsi) -f bjCOs(ixi) 
j=i 

, where each fourier function is characterized by selecting values for each aj and 
6j. Each problem set is composed of 40 values for the independent variables x 
uniformly distributed between [—5,5]. The raw fitness of an s-expression s in the 
GP population is set to ||2/i — Oi,p|| where o, p is the output of s when 

presented with the independent variables x. 

It is then possible to quantitatively measure the distance between two train- 
ing sets ti and using 



4 

«=i 

if we assume that the set of dependent variables x is invariant across all possible 
training data. 

All of the results below were generated using a GP instantiation with a 
function set covering arithmetic and trigonometric functions; a terminal set of the 
dependent variables and two floating point constants; one mutation for every 100 
nodes involved in subtree crossover; a population size of 1000; and tournament 
selection with a tournament size of 10. 

Each client population is assigned one fourier function as described above, 
and is run for 500 generations, which was chosen to ensure convergence in most 
populations. The benefits of convergence prior to migration are reported in [1] 
and [10]. At the end of the run the most fit solution from each population 
is sent to the central server, and the populations are then reinitialized with 
random solutions, as well as downloading solutions from the server. Transferred 
solutions then participate in selection and crossover depending on their fitness 
in the new population. Each population operates asynchronously; populations 
terminate and begin new runs independent of the other populations. 

3 Results 

For the case of a set of client populations assigned the same problem set, the 
model presented in this report operates as a distributed panmictic population: 



* This minimizes the possibility of a client population reconverging to a solution it 
had previously uploaded to the server. 
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the client populations are equivalent to demes, in which solutions migrate at 
fixed intervals, and the demes are fully connected. [2] 

Consider now a set of client populations evolving solutions to similar, but 
different problems. This situation was modelled by generating 40 fourier func- 
tions using random values of aj and bj as discussed in section 2. The distances 
between each pair of the 40 problem sets was determined using equation 1, and 
the two problem sets with the minimum distance were assigned to two client 
populations. 

The two client populations were run for 500 generations each, sending their 
most fit solution at the end of each run to the server. The populations were 
then re-initialized with random solutions, as well as retrieving solutions from 
the server. This process was repeated over 20 runs. It was found that the two 
populations experienced a decrease in the normalized fitness of the most fit solu- 
tion in each of the two populations at the end of each run, as the number of runs 
increases. The results obtained are shown in Figure 1 . The results are contrasted 
against a control case: two populations were evolved for 500 generations over 20 
runs with no sharing. 




Fig. 1. A plot of the performance increase exhibited by client populations evolving 
solutions to similar training data. The test case involved two populations evolving 
solutions over 20 runs of 500 generations each, and trading their most fit solutions 
after each run via a centred server. The control case involved two populations evolving 
over 20 runs of 500 generations each, but did not involve trading of solutions. Results 
from the test case and control case were averaged over 10 iterations. 



4 Discussion 

In modelling two client populations evolving solutions to similar, but different 
problem sets, it Wcis found that populations do not converge on a single optimum, 
but rather continue to explore regions of the search space near some optimum 
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common to both populations. This opposes the findings for traditional, multi- 
deme GA populations reported in [3], in which the parameter dictating the mi- 
gration rate of solutions between demes acts as a threshold, below which demes 
converge on different, but inferior optima, and above which demes (sometimes 
prematurely) converge on the best solution discovered across all demes. 

The continuous search property of the model presented in this report is 
demonstrated in Figure 2, in which two client populations are contrasted. In 
the control population, 20 runs of 500 generations are performed, and averaged 
over 20 iterations. At the end of each run, the most fit solution in the popula- 
tion is preserved, but undergoes a large mutation, in which a sub-tree of depth 
two is replaced with a random sub-tree of depth two. The remaining solutions 
in the population are replaced with random solutions. No solutions from this 
population are sent to or received from the central server. The other population 
trades solutions with other evolving populations as described in section 2. Both 
populations evolve solutions to the same fourier function. 

The large mutation experienced in the first population is meant to simulate 
the movement of a population a small distance away from the fitness optimum it 
had recently achieved. During subsequent evolution, the population may return 
to its previous optimum, or discover another, either inferior or superior optimum. 

Over the 20 runs it was found that both populations continue to find better 
solutions, but that the population that trades solutions with other populations 
reaches better optima than those discovered by the control population. This 
indicates that the population that shares solutions discovers new, better optima 
more often than the population that undergoes repeated large mutations. 




Fig. 2. Two populations evolving solutions to the same problem set for 20 runs of 500 
generations each. In the control population, at the end of each run, the best solution 
undergoes a large mutation, while the remaining solutions are replaced with random 
solutions. In the test population, at the end of each run, the best solution is sent to 
the server, all solutions are replaced with random solutions, and solutions are then 
downloaded from the central server. The resnlts from both popnlations were averaged 
over 20 iterations. 
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5 Conclusions 

In this paper we have explored the cooperative, coevolutionary dynamics that 
appear in an asynchronous, distributed genetic programming architecture. As 
opposed to the usual formulation of a multideme, distributed GA with a speci- 
fied transmission rate of solutions between populations, our architecture services 
client populations with different problem sets drawn from a common problem 
domain. Transmission of solutions is arbitrated by a central server. 

It was found that mutual exchange of solutions between client populations 
with similar, but different problem sets exhibits a coevolutionary dynamic; solu- 
tions from one population, when inoculated in another population, can pull the 
second population away from its current local optimum, and often lead to the 
discovery of better optima. 

References 

1. Braun, H. C.; On Solving Travelling Salesman Problems by Genetic Algorithms. 
In: Schwefel H.-P. R. Manner (eds.): Parallel Problem Solving From Nature, pp. 
129-133. Springer- Verlag, Berlin (1990). 

2. Cantu-Paz, E.: Topologies, Migration Rates, and Multi-Population Parallel Genetic 
Algorithms. To Appear in: GECCO-99, Genetic and Evolutionary Computation 
Conference, July 13-17, Orlando FL (1999). 

3. Cantii-Paz, E.: A Survey of Parallel Genetic Algorithms. In: Calculateurs Paralleles. 
10:2 (1998). 

4. Cantu-Paz, E. &c D. Goldberg: Modelling Idealized Bounding Cases of Parallel Ge- 
netic Algorithms. In: Koza, J., K. Deb, M. Dorigo, D. Fogel, M. Garzon, H. Iba 

R. Riolo (eds.): Genetic Programming 1997: Proceedings of the Second Annual 
Conference, pp. 353-361. Morgan Kauffman, San Francisco CA (1997). 

5. Goldberg, D. E.: Genetic Algorithms in Search, Optimization, and Machine Learn- 
ing. Addison- Wesley, Redwood City CA (1989). 

6. Hillis, D. W.: Co-Evolving Parasites Improve Simulated Evolution as an Optimiza- 
tion Procedure. In: Langton, C. G., Taylor, C. (eds.): Artificial Life II, pp. 313-322. 
Addison- Wesley, Redwood City CA (1992). 

7. Koza, J. R.: Genetic Programming: On the Programming of Computers by Means 
of Natural Selection. MIT Press, Cambridge MA (1992). 

8. Mahfoud, S. W.: A Comparison of Parallel and Sequential Niching Methods. In: 
ICGA 6, pp. 136-143. (1995). 

9. Marin, F. J., O. Trelles-Salazar & F. Sandoval: Genetic Algorithms on LAN-message 
passing architectures using PVM: Application to the Routing Problem. In: Davidor, 
Y., H.-P. Schwefel & R. Manner (eds.): Parallel Problem Solving from Nature III, 
pp. 534-543. Springer- Verlag, Berlin (1994). 

10. Munetomo, M., Y. Takai & Y. Sato: An Efficient Migration Scheme for 
Subpopulation-Based Asynchronously parMlel genetic algorithms. In: Forrest, S. 
(ed.): Proceedings of the Fifth International Conference on Genetic Algorithms, p. 
649. Morgan Kauffman, San Mateo CA (1993). 

11. Miller, B. L. &; M. J. Shaw: Genetic Algorithms with Dynamic Niche Sharing for 
Multimodal Function Optimization. In: IEEE International Conference on Evolu- 
tionary Computation, pp. 786-791. IEEE Press, Piscataway NJ (1996). 




The Evolution of Computation in Co-evolving 
Demes of Non-uniform Cellular Automata for 
Global Synchronisation 



Vesselin K. Vassilev, Julian F. Miller, Terence C. Fogarty 

School of Computing, Napier University, Edinburgh, EH14 IDJ, UK 
V . vassilevQdcs .napier . ac .uk 



Abstract. We study the evolution of computation performed by non- 
uniform cellular automata in which global information processing ap- 
pears at two different levels of self-organisation. In our model, the first 
level of self-organisation is characterised by interactions among cellular 
macrostructures or computational demes which compete for room in a 
finite grid of cells. This level is related to the formation, evolution and 
extinction of macrostructures, and it is designed in a completely local 
manner. The second level of self-organisation refers to the interactions 
among the cells within the demes. The model, derived from the cellular 
programming approach, allows global computation to occur as a result of 
many local interactions among computational demes of interacting cells. 
The study reveals some of the mechanisms by which co-evolving demes 
of non-uniform cellular automata perform non-trivial computation, such 
as the synchronisation tasks. 



1 Introduction 

This paper studies the evolution of computation performed by a system in which 
global information processing appears as a result of the interactions among many 
components, each of which is a system which in turn exhibits an ability for 
global computation at a different level of self-organisation. The model, derived 
from the non-uniform cellular automata model [1-3], is inspired by the idea of 
co-evolving to a higher degree of specialisation with macrostructures in a single 
computational ecosystem. For other relevant work, we refer to the ecological 
models studied in [4-6]. 

Cellular automata (CAs) are discrete dynamical systems of simple locally 
connected interacting cells [7-9]. They are simple, general, and computationally 
powerful [10, 11]. The model is perhaps the simplest example of systems that are 
capable of emergent computation - global information processing that appears 
in systems from the action of many interacting components [12, 13]. The evolu- 
tion of CAs that perform non-trivial computation has been studied in [14,15]. 
They evolved a population of CA rules to attain a single CA able to perform 
a particular computational task. However, computation requiring global coordi- 
nation can be arduous for CAs [16], even if they are designed by evolutionary 
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techniques. It has been suggested that better computational performance can be 
attained by using many CA rules or a non-uniform CA [17]. 

Non-uniform CAs are cellular automata with cells that may contain different 
transition rules [17]. The evolvability of non-uniform CAs has been studied in [18] 
where cellular programming algorithm has been employed to locate non-uniform 
CAs for different computational tasks. It appears that the evolution of non- 
uniformity leads to a quasi-uniform CA with one dominant rule occupying most 
but not all of the cells [18]. 

In this paper we present the progress of a related but distinct approach. 
We employ a modification of cellular programming to evolve non-uniform CAs 
in which global computation occurs by local interactions among computational 
demes of interacting cells. The notion of a deme comes from biology where it 
refers to the formation of local subpopulations (demes) that are sufficiently iso- 
lated to permit differentiation in a population [19]. In evolutionary computa- 
tion, the term pertains to various implementations of evolutionary algorithms 
in which several subpopulations are evolved in parallel [20]. The motivation for 
co-evolving demes of cells is to attain a co-evolutionary model in which the co- 
evolution to quasi-uniformity could be avoided. The feasibility of such a model 
has been discussed in another paper [21]. Here, we concentrate on how the process 
of “building” emergent computation is related to the co-evolution of computa- 
tional demes of co-evolving cells. The paper reveals some of the mechanisms 
that co-evolving demes of non-uniform cellular automata employ to perform 
non-trivial computation, such as the synchronisation task. 

In the next section, we introduce the CAs model and describe the synchro- 
nisation task. Section 3 defines the co-evoIutionary algorithm derived from the 
cellular programming approach. Section 4 investigates the evolution of com- 
putational demes for global synchronisation. Finally, we give conclusions and 
intentions for future work. 

2 Cellular Automata and the Synchronisation Task 

A CA consists of an array of cells which accept a finite number of states, k, and 
a transition rule by which the states of the cells are changed synchronously in 
discrete time, t. We say that the cells interact locally since the state of each 
cell at the next time step is specified by the current states of the cell and its 
surrounding. The state of a cell together with the states of the surrounding cells 
is called a neighbourhood of the cell. The size of the neighbourhood refers to the 
number of inputs of the CA rule. Since we consider one-dimensional CAs, the 
size of each neighbourhood is 2r -I- 1 where r is the rule radius. We say that a 
CA is non-uniform, if its cells accept different transition rules. 

The behaviour of a CA can be illustrated by its space-time diagram in which 
the configuration of states in the grid is given as a function of time. Such an 
illustration is given in Figure 1 which represents the evolution of one-dimensional 
uniform and non-uniform CAs with k = 2 and rule radius 3. Each grid is taken 
with spatially periodic boundary conditions, meaning that the grid is considered 
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as a circle in which the leftmost and rightmost cells are immediate neighbours. 
The space-time diagrams represent the evolution of CAs of 149 cells, starting 
from a randomly generated initial configuration and iterated over 149 time steps 
with time increasing down the page. The diagrams are depicted with white and 
hlack dots which represent cells in states 0 and 1, respectively. 




Fig. 1. The synchronisation task. Space-time diagrams of (a) uniform and (b) non- 
uniform CAs, dicovered by evolutionary techniques. The initial configurations are cho- 
sen randomly with densities of Ts p(0) « 0.51. The CAs are one-dimensional with grid 
size 149, k = 2, and rule radius r = 3. 



The global synchronisation task is to locate a CA which performs a simple 
oscillation between O’s and I’s configurations of the lattice. The task is non-trivial 
for rule radius r ^ N since the synchronous oscillation is a global property of the 
cellular array which must be attained by local interactions between neighbouring 
cells. 

The synchronisation task was introduced in [15]. They employed a genetic 
algorithm [22] to evolve a population of cellular automata rules for a uniform 
CA which can solve the task. The fitness value of each CA is the fraction of 
correct oscillations of O’s and I’s, attained after M = 149 iterations of randomly 
chosen initial configurations with densities of I’s that are uniformly distributed 
over p(0) 6 [0.0, 1.0]. 

The synchronisation task was also studied by [18] who employed cellular 
programming to co-evolve non-uniform CAs. The fitness value of each co-evolved 
non-uniform CA is the average of the fitness values of all cells calculated in the 
same manner as [15] with respect to each cell. 

Figure 1 illustrates the evolution of two solvers of the synchronisation task: 
(a) uniform CA [15], and (b) non-uniform CA co-evolved by cellular program- 
ming. The depicted plots reveal how the CAs converge to synchronous oscil- 
lations between O’s and Ts configurations for a certain number of time steps 
M = 149. 
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3 The Co-evolutionary Model 

We show how it is possible to co-evolve non-uniform CAs which are able to 
perform emergent computation at two different levels of self-organisation. In 
this method, the global computation performed by a particular evolved non- 
uniform CA is attained by interactions among computational demes of cells 
where the computation of each deme arises from the interaction of all the cells 
belonging to the deme. Such a simple model is easily identifiable in many systems 
in nature capable of self-organisation [23]. The algorithm is a modification of 
cellular programming given by Sipper [17] and it is defined as follows: 

1. Initialise the population of N cells at random with CA rules, uniformly 
distributed among different A values (the percentage of all the entries in the 
rule table which map to non-zero states) [24], and set N demes, each of 
which has size 1 (a single cell). 

2. Generate 2{N -1- 1) initial configurations at random with uniformly dis- 
tributed densities of I’s over the interval [0, 1]. 

3. Run the CA N +1 steps for each configuration and attach a fitness value to 
each cell by determining the number of correct states of the cell in the final 
configurations, then calculate the fitness value of each deme by averaging 
the fitness values of the deme’s cells. 

4. Label each cell with a fitness score that specifies the number of fitter neigh- 
bouring cells which are elements of the cell’s deme. 

5. Update the population. 

(a) Update the demes as follows: (1) if a deme of one cell is surrounded by 
fitter demes then the deme (cell) is invaded by the fittest deme; (2) if 
a deme of two or more cells has a fitter neighbour, then the border cell 
is invaded by the fitter neighbouring deme. The invaded cell is labelled 
with the highest possible fitness score. 

(b) Update the cells as follows: if the fitness score of a cell is: 0, leave the 
rule unchanged; 1, replace the rule with the mutated fitter neighbour 
even if it is invaded by any neighbouring deme; 2 or higher, replace the 
rule by recombination of two fitter neighbouring rules even if they are 
invaded by neighbouring demes, followed by mutation. 

6. If not finished, go to step 2. 

An important part of the algorithm that needs additional explanation is how to 
determine when a deme can invade cells of its neighbours. We say that a deme 
invades a cell, when the deme expands over the cell. It is obvious that if a deme, 
which is fitter than its neighbours, is always able to expand over the grid then 
the population will converge very soon to only one deme, and the evolution in the 
deme’s level will subside. To avoid the very fast expansion of the fitter demes over 
the grid, we employ a threshold function r(t) = called deme’s tolerance, 
in which the parameter p is a real number, p > 0, and n{t) is the number of 
demes at evolutionary step t. So, a deme, d', will invade a cell of a neighbouring 
deme, d", if the deme’s fitness value f(d') is higher than /(d") -I- r(t). Initially, 
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for 0 < £> <C A^, the tolerance r(t) is very small so deme’s invasion is promoted, 
later as n{t) drops, the demes only invade if they are substantially fitter. The 
mechanism allows us to maintain several isolated demes in a grid of strongly 
interacting cells, and thus, to attain a higher order of cellular computation. 

The algorithm is similar to cellular programming in the following respect. 
The evolutionary process in the two levels of self-organisation is organised in 
a completely local manner. First, the interactions among the demes are local, 
since each deme can see oniy its immediate neighbours, and the deme can in- 
vade only one cell per neighbour. Second, the interactions among the cells are 
local, since the rule of each cell can be replaced with a rule obtained by apply- 
ing mutation or recombination to rules of neighbouring cells. The mutation is 
uniform with a certain probability of flipping a bit of the rule, and the operator 
for recombination is onepoint crossover. 

The co-evolutionary model, introduced above, differs from many parallel evo- 
lutionary algorithms defined mainly on the coarse-grained island or fine-grained 
cellular models [20, 18] in three aspects. Firstly, the evolution is performed in two 
different levels of self-organisation - evolution of demes, and evolution of cells 
within the demes, which in turn refers to the aforementioned island and cellu- 
lar models, respectively. Secondly, the model deals with simple co-evolutionary 
mechanisms which lead to the appearance of interacting demes, each of which 
may differ from the other in size. The demes are implicitly defined, since they 
appear during the evolution by a simple unionisation of neighbouring cells. The 
formation, evolution, and extinction of demes lead to the evolution of unique 
cellular macrostructures with size strongly dependent upon the capabilities of 
the corresponding cells to perform useful computation. Lastly, the algorithm 
maintains a population in which the communications between the demes are 
implicitly defined. At this point, our model strongly differ from the other coarse- 
grained parallel evolutionary algorithms in which the demes’ interconnections 
are explicitly specified by a certain migration scheme. In our model, the demes 
communicate implicitly through the environment which is strongly dynamical, 
and the communications appear as a result of the endeavour of the demes to 
cooperate in order to perform useful computation. 

The non-uniform cellular computation attained by co-evolving computational 
demes is extremely intriguing. We performed many experiments with different 
tolerance parameters in which various synchronising non-uniform CAs were dis- 
covered. The co-evolved non-uniform CAs consist of a small number of demes 
each of which is a quasi-uniform CA. In our experiments, we observed two dif- 
ferent computational strategies for achieving global synchronisation, explicitly 
studied in [21]. In short, the computational strategies are 

Strategy I Some of the demes are perfect solvers of the task while the remain- 
der are perfect parasites - harmless and helpless - only capable of propagating 
the information signals to the nearest solver(s). 

Strategy II The global synchronisation appears by cooperation of all demes 
each of which unable to perform the task. In this strategy, the decisive com- 
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Fig. 2. Synchronising computational demes: space-time diagrams of co-evolved non- 
uniform CAs that perform the synchronisation task by (a) strategy I, and (b) strategy 
II. The CAs are one-dimensional with grid size 149 and k — 2, r ~ 2. The initial 
configurations are chosen randomly. 



putation is performed by two or more demes, while the remainders ensure 
and accelerate the computation in the aforementioned group of demes. 

The strategies are illustrated in Figure 2 which depicts the space-time diagrams 
of two co-evolved non-uniform CAs. 

4 The Co-evolution of Demes for Synchronisation 

In this section, we examine in details the process of co-evolution of demes, using 
the model above, for the synchronisation task. The co-evolved non-uniform CAs 
are one-dimensional with two possible cell states, and rule radius 2 (fc = 2 and 
r — 2). The grid consists of 149 cells with spatially periodic boundary conditions. 
The deme’s tolerance parameter q is set to 1.2. The mutation probability is 0.001. 

We describe the process whereby global synchronisation is constructed, by 
following the average fitness of the population in a typical co-evolutionary run. 
According to the average fitness plot depicted in Figure 3, we identify six epochs, 
samples of which, obtained at generations (a) 1, (b) 12, (c) 28, (d) 37, (e) 46, and 
(f) 241, are described in Figures 4 and 5. The first epoch starts at generation 0 
and each of the following epochs are characterised by significant improvements 
in the ability of the non-uniform CA’s strategy to solve the synchronisation 
task. For each epoch, we investigate the features of the non-uniform CA that 
characterise the epoch; firstly, by examining the distribution of the cells amongst 
the computational demes, given in Figure 4, and secondlj^ by analysing the 
space-time diagram of the discovered non-uniform CAs, depicted in Figure 5. 
The latter figure also reveals how the computational demes are located in the 
cellular lattice with respect to the two reference demes d^o and dgg. So, having 
Figures 4 and 5, we can easily determine the demes and their locations in the 
grid. For instance, the CA obtained at generation 241 consists of demes dgg, dn, 
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Fig. 3. Co-evolving demes of non-uniform CAs for global synchronisation. Results of a 
typical co-evolutionaxy run with rule radius 2: (a) the average fitness of all grid cells, 
and (b) the number of computational demes. 



d-72j dgs, dioi, d}i3, dii 3 , and c /145 (Figure 4f), which are the 1**, 2"'^, 3’’'^, etc. 
demes, starting from deme ^35 (Figure 5f). 

In addition, we study how the performance, P, of the co-evolved non-uniform 
CAs is changed. The performance of a CA is a measure of the fraction of the 
initial configurations for which the CA successfully synchronises in 149 iterations. 
The performance is scaled in the interval [0, 1] and it is estimated over 300 
randomly chosen initial configurations with a uniform distribution of densities 
of Ts over the interval [ 0 , 1 ]. 

Epoch 1 (generations 0-11) Initially, the number of demes is 149 which deter- 
mines very low tolerance among the demes. Since the cells have different abili- 
ties for solving the task the number of demes decreases in a way that is faster 
than linear. The epoch can be called a wild phase. It is characterised by a fast 
extinction of unicellular organisms and a formation of multicellular ones. The 
formation of multicellularity is also related to the appearance of co-evolution 
within the demes. At the end of the epoch, it leads to the formation of local 
areas of the lattice that successfully synchronise on a small fraction of the input 
configurations. The performance of the CAs in this epoch is increased from 0 
to approximately 0.02. An example of a non-uniform CA, which is labelled with 
“a” in the presented figures, is taken from generation 1. Its space-time diagram 
(Figure 5a) shows a huge diversity of cellular computations represented at this 
stage of the co-evolution. 

Epoch 2 (generations 12-20) Although, the average fitness is significantly higher, 
the fraction of the initial configurations that the CAs are able to synchronise 
is low, P 0.04, which indicates that only local synchronisation is attained. 
This is shown in Figure 5b which presents the space-time diagram of the non- 
uniform CA attained at generation 12 . The CA consists of 18 different demes 
(Figure 4b), the majority of which perform localised synchronisation for a small 
fraction of initial configurations individually without cooperation. The epoch is 
characterised by a fast extinction of demes (Figure 3b), especially those com- 
prising a small number cells. 
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Fig. 4. The distribution of cells amongst the computational demes attained at gener- 
ations (a) 1, (b) 12, (c) 28, (d) 37, (e) 46, and (f) 241, as labelled in Figure 3. 



Epoch 3 (generations 21-33) The average fitness increase is followed by an in- 
crease in the performance of the co-evolved non-uniform CAs from 0.073 to 0.286. 
This is caused by the appearance of cooperative demes as is revealed in Figure 5c 
which shows the evolution of a non-uniform CA attained at generation 28. It can 
be seen that the localised synchronisation starts to expand over the whole grid, 
however only until it meets deme dus which is a relatively poor solver when 
compared with the remainder of the grid. We suggest that the emergence of 
cooperation appears as a result of the co-evolution within the demes which is 
indicated by the slight changes of the demes characteristics (Figures 3b and 4c). 



Epoch 4 (generations 34-45) The epoch is characterised by the extinction of 
deme d^g which does not change the computational strategy of the CA, and 
the evolution of deme ^145 which leads to an increase of the CAs performance to 
0.576. The space-time diagram of the CA obtained at generation 37 is given in 
Figure 5d. It is demonstrated that the computation performed by dus is signif- 
icantly improved, however, the deme is still badly adapted to its surroundings 
dio and dn$. According to Figures 4d and 5d, deme ^145 is the 7*^ deme on the 
right of d 56 . 
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Fig. 5. The evolution of computation in co-evolving demes for synchronisation. The 
space-time diagrams illustrate the evolution of the non-uniform CAs attained at gen- 
erations (a) 1, (b) 12, (c) 28, (d) 37, (e) 46, and (f) 241, as labelled in Figure 3. The 
initial configurations are chosen randomly with densities of I’s (a) p w 0.074, and 
(b)-(f) p « 0.503. 



Epoch 5 (generations 46-240) The co-evolution of deme ^145 observed in the 
previous epoch led to the discovery of a new computational strategy of the 
deme. It is marked with a rapid increase of the CA’s performance to P « 0.81 
which is also caused by the re-adjustments amongst the demes (Figure 4d-e). 
The evolution of the non-uniform CA attained at generation 46 is shown in 
Figure 5e. It can be seen that ^145 is perfectly matched with demes d^Q and 
du 5 . However, we noticed that often taken together with ^40 synchronises 
in more than 149 iterations. The figure also shows that the changes in demes 
^40 and ds 5 led to another unsolvable computational problem, captured in the 
space-time diagram. The epoch is characterised with the evolution of ^40 which 
finally becomes extinct. 

Epoch 6 (generations 241-300) The epoch starts with the extinction of deme ^40 
followed by a fitness increase to 0.999 and an increase of the CA performance 
to approximately 0.997. The problem is perfectly solved for 300 CA iterations. 
The type of the computation of the solver, shown in Figure 5f, is described in 
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section 3 as computational strategy II. Although the deme’s tolerance is high, 
it seems that the deme ^40 wa.s unable to adapt in the co-evolved scenario, 
which caused its extinction, followed by a perfect match between ^145 and 
Significant changes in the remainder of the grid were not observed (Figures 4f 
and 5f). 

5 Conclusions 

We have studied the co-evolution of synchronising non-uniform CAs in which 
the global information processing appears as a result of the co-evolution of com- 
putational demes, each of which is a system of interacting cells, capable of global 
computation at a different level of self-organisation. To attain co-evolution in two 
levels of self-organisation, we defined a threshold function that increases as the 
number of demes decreases so that the dynamics of the interactions amongst the 
demes were increasingly suppressed. It allowed us to co-evolve amazing compu- 
tational strategies for non-uniform CAs that perform the synchronisation task. 

We were able to study a simple process of formation of multicellular organ- 
isms, each of which may differ from the other in size (the number of cells). A 
co-evolutionary model of interacting organisms that differ from each other in 
size and mass has been studied by [6]. The author has studied the importance 
of body size for adaptation in heterogeneous ecological communities. Our inves- 
tigation goes a little further. It suggests that the adaptation in an ecosystem is 
not only related to the size of an organism but also to the organism’s ability 
to cooperate in a complex environment. This is clearly demonstrated in the last 
two figures, where for instance, at generation 46 demes ^40 and dy 2 consist of 
30 and 1 cells, respectively, however, at generation 241 the deme ^40 no longer 
exists while deme dj 2 still consists of a single ceil. 

The presented model offers two main directions for future work. Firstly, it 
will be interesting to investigate how the model will work for other computa- 
tional tasks, such as the density classification task. Also, how the computational 
strategies of the co-evolved CAs are related to other values of the rule radius of 
the CAs? Secondly, the model can be improved by taking account of our intu- 
ition that better co-evolutionary search would be attained if we relate the deme’s 
tolerance to the size of the demes. This remains for the future. 
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Abstract One important implication of embodiment is that, by acting, agents 
partially determine the sensory patterns they receive from the environment. The 
motor actions performed by an agent, by modifying the agent's position with 
respect to the external enviroiunent and/or the external environment itself, 
partially determine the type of sensory patterns received from the environment. 
In this paper we investigate how agents can take advantage of this abilify. In 
particular, we discuss how agents coordinate sensory and motor processes in 
order to (1) select sensory patterns which are not affected by the aliasing 
problem and avoid those which are; (2) select sensory patterns such that groups 
of patterns which require different responses do not strongly overlap; (3) 
exploit emergent behaviors that result from the interaction between the agent 
and the environment. 



1. Introduction 

Recently, a new research paradigm has challenged the traditional view according to 
which intelligence is an abstract process that can be studied without taking into 
consideration the physical aspects of natural systems. In this new paradigm, rc^arch 
tends to stress the importance of situatedness (i.e. the importance of studying systems, 
natural or artificial, which are situated within an external environment) and 
embodiment (i.e. the importance of study systems which have bodies, receive input 
from their sensors and produce motor actions as outputs). We will refa to systans 
which are embodied and situated as agents. 

One important implication of embodiment is that, by acting, agents partially 
determine tihe sensory patterns they receive from the environment. The motor actions 
performed by an agent, by modifying its position with respect to the external 
enviroiunent and/or the external enviroiunent itself, in fact partially determine the 
type of sensory patterns that it will receive from die environment. As we will see 
below, agents can take advantage of this ability in different ways. We will refer to the 
process of exploiting the agent-environment interaction (i.e. the ability to select 
sensory patterns that are useful for some purpose through certain motor actions) as 
sensory-motor coordination (for a similar view see [1]). We will show three different 
ways in which sensory-motor coordination can help to solve otherwise insoluble 
problems. 

The way in which sensory-motor coordination can be exploited also dtepends on the 
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characteristics of the agent. As shown in [2], for example, agents that are able to 
modify themselves (i.e. to leant) during their interaction with the environment can use 
sensory-motor coordination to select sensory patterns that are useful to learn (i.e. they 
can alter the ftequency and the order of different sensory jmttems to enhance the 
outcome of the learning process). In this paper however, we will restrict our analysis 
to the simplest case - that of pure sensory-motor agents which caruiot mo^y 
themselves during their interaction with the environment and which do not retain any 
trace of the previously experienced sensory patterns. These agents, by definition, 
always le^t in the same way to the same sensory pattern. 



2. How sensory-motor coordination can cope with the perceptual 
aliasing problem 

One of the most straightforward ways in which sensory and motor processes can be 
coordinated is to solve tte perceptual aliasing poblem. Perceptual aliasing, a term 
coined by Whitehead and Ballard [3], refers to the situation wherein two or more 
identical sensory patterns require different responses in order to achieve a certain 
goal. Wlten such a situation occurs (i.e. when an agent receives a sensory pattern that 
requires different motor resptmses in different circumstances) the agent should act in 
order to select other sensory patterns until a sensory patterns which is not affected by 
the aliasing problem (i.e. an unambiguous sensory pattern) is encountered. 

Consider for example the case of a Ktepera robot [4] (Fig. 1, left) which is placed 
in an environment containing two types of object: one with the top painted black, 
which should be avoided, and one with the tc^ painted white, which should be 
approached (Fig. 1, right). The Khepeia robot is provided with 8 infrared proximity 
sensors which can detect the bottom part of the obstacles up to a distance of about 3 
cm. Maeover, the robot is provided with a K213 linear camera which has 64 
photoreceptors and produces a linear image composed of 64 pixels of 256 gray-levels 
each, subtending a view-angle of 36°. On the motor side, the robot is provided with 
two wheels controlled by two motors which can rotate in both directions. 




Fig. 1. The robot and the environment. The robot has 8 infrared proximity sensors that can 
detect the bottom part of the obstacles and a linear camera that can detect the color of the top 
part of the obstacles. 



Every time such a robot appro^hes an object that does not happen to be in the 
viewing angle of its camera, it will experience an ambiguous sensory pattern (i.e. a 
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sensory pattern which is affected by the aliasing problem). In fact, the same type of 
sensory pattern will be experienced by the robot both in the case of objects to be 
approached and in that of objects to be avoided. The obvious solution to diis type of 
problem is to turn toward the object. By turning so as to have the object within the 
36° view angle of the camera the robot will finally receive a unambiguous sensory 
pattern (i.e. frontal infiared sensors on and white image for objects to be approached 
and frontal infrared sensors on and a black image for objects to be avoided). The 
process of selecting senswy patterns that are easy to discriminate through motor 
actions is usually referred to as active perception [5]. Several examples of processes 
falling within this category have been identified in natural OTganisms. As 
demonstrated by Dill et al. [6], for example, in order to recognize certain visual 
patterns, drosophila moves so as to shift the perceived image to a certain location in 
the visual field. 



3. How sensory-motor coordinatitm can simplify hard problems 

In the previous section we saw how the problem of the same sensory pattern requiring 
different answers can be solved by selecting other unambiguous sensory patterns, if 
any exist, through sensory-motor coordination. In this section we will investigate 
problems in which various sensory patterns requiring different motor answers 
strongly overlap, although not completely. When this happens, agents face hard 
problems that may te difficult or even impossible to solve. In this sectirxi we will try 
to ^ow how sensory-motor coordination can turn these hard problems into simpler 
ones. 

The distinction between simple and hard problems has been recently fonnalized by 
Clark and Thornton [7]. Ihey introduced Ae term type-2 problems to denote hard 
tasks in which the problem of mapping input patterns into appropriate output patterns 
is complicated by the fact that the regularities', which can ^ow such mapping, are 
hidden or marginal within the sensory patterns. On the basis of this consideration 
these authors distinguished type-2 problems (i.e. hard p-oblems) from type-1 
problems (i.e. easy problems) in which a sufficient number of regularities are directly 
available in the sensory patterns. 

As claimed by Clark and Thornton, type-2 problems which require complex input- 
output mapping may be reduced to type-1 (tractable) problems by re-coding sensory 
information so to enhance useful regularities. This can be achieved in two different 
ways. One possibility is to internally re-code sensory inputs so as to enhance useful 
regularities. Elman [8], for example, showed how complex tasks which cannot be 
solved by training a feed-forward neural network using standard back-projagation 
can be solved if the network is first trained using a simpler subtask and then exposed 
to the full task. As claimed in [7] and [8] this can be explained by considering that the 
first learning phase affects how the sensory patterns are re-coded at the level of the 
internal representations. This re-coding, by enhancing the regularities of the sensory 
patterns, turns the process of learning the entire task into a type-1 problem [7]. 



' The tenn ‘regularity’ refers to features of the sensory patterns which can be used to 
discriminate between classes of sensory patterns that require different answers. The exact 
meaning of the term will become clearer later on. 
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As shown by Scheier et al. [9], however, agents can also transform a type-2 
problem into a type-1 problem by actively structuring their own input through 
sensory-motor coordination (interestingly, this strategy, as we will see, cannot be used 
by systems that are passively exposed to sensory states, i.e. systems which are not 
embodied and situated). 

The authors considered the case of a Khepera robot which is supposed to approach 
large, and to avoid small, cylindrical objects. The robot is provided with six frontal 
infrared proximity sensors and 2 motor units that encode the actual speed of the two 
wheels. The environment is an arena surrounded by walls which contains small and 
large cylinders. 

As ^own by Nolfi [10,1 1] who reported a similar experiment in which a Khepera 
robot had to approach small cylindrical objects and avoid walls, the task of 
discriminating between the two types of objects is far from trivial for a agent that is 
required to passively discriminate between the sensory patterns produced by the two 
objects. This can be explained by considering that the sensory patterns received by the 
robot depend largely on the distance and on the relative angle between the robot and 
the objects. As a consequence, the sensory patterns belonging to the two categories 
largely overlap. "Put differently, the distance in sensor space for data originating from 
one and the same object can be large, while the distance between two objects from 
different categories can be small" [9, p. 1559]. On the other hand, as shown by Nolfi 
[10, 1 1] and by Scheier et al. [9], the task can easily be solved by an agent that is left 
free to perform sensory-motor coordination. Aside from these similarities the two 
experiments seem to call for different explanations. For this reason we will describe 
the experiment reported in [9] first and the experiment described in [10, 1 1] later. 

In the simplest experiment reported in [9] Scheier et al. used artificial evolution to 
select the weights of the robot's neural controllers. Individuals' fitness was increased 
each time step they were close to a large object and decreased each time step they 
were close to a small object or a wall. As the authors show, performances increase 
during the first generations and stabilize in the vicinity of optimal performance after 
about 40 generations. In other words, while passive systons (i.e. neural netwoiks 
which are required passively to classify a set of patterns conesponding to two 
different objects, see [10, 11]) display poor performance, agents that are allowed to 
exploit sensory-motor coordination can solve the task easily. The fact that the 
coordination between the sensory and the motor processes is crucial in solving this 
task can be clearly demonstrated by observing the behavior of evolved individuals 
and by observing how the distribution of the sensory patterns changes through the 
generations. 

As reported in [9], the fittest individuals in 86% of the runs move in the 
environment until they start to perceive an object (large or small) and then start to 
circle around the object (the other 14% stop in front of the objects, although these 
individuals display significantly poorer performances). At this point the robot 
continues to circle around large objects while avoiding and abandoning small objects. 
This circling behavior is crucial to accomplish the discrimination between die two 
types of object given that the sensory patterns that the robot experiences while 
circling the small objects are significantly different from those that the robot 
experiences while circling the large objects. In other words, the sequence of motor 
actions leading to the circling behavior allows the robot to select sensory patterns that 
can easily be discerned. 

The role of sensory-motor coordination has been further demonstrated by 
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measuring the extent to which sensory-patterns belonging to different objects are 
separated for individuals of different generations (i.e. by measuring the difficulty of 
the discrimination task for different individuals). Indeed, given that the type of 
sensory pattern that an individual receives from the environment depends partly on 
how the individual reacts to each sensory state, individuals who behave differently 
may face harder or simpler discrimination tasks. To accomplish this measure the 
authors used the geometric separability index (GSI) proposed by Thornton [12], 
which can be used to quantify the distinction between type-1 and type-2 problems 
introduced above. In the case of this experiment, the GSI can be used to measure to 
what extent sensory patterns corresponding to one and the same object are close in 
sensory space and to what extent sensory patterns corresponding to different objects 
are separated. GSI is computed by counting the average number of times the sensory 
pattern nearest to the current sensory pattern falls into the same category. In the case 
of patterns corresponding to only two categories (small and large objects) this can be 
calculated in the following way: 

tifixd + f(xi') + l)mod2 
GSI(f) = 

where / is the category of the object, x are the sensory patterns consisting of N 
vectors, and x,' is the nearest neighbor of x,. 

As reported in [9], in these experiments the GSI value starts from about 0.5 and 
monotonically increases during the first 40 generations until it reaches a stable state 
around 0.9 (note that performance also increases during the first 40 generations). This 
means that individuals of successive generations increase their ability to coordinate 
the sensory and motor processes so that experienced sensory patterns corresponding 
to one object are similar amongst themselves and are different from sensory patterns 
corresponding to other objects. In other words, evolved individuals are able to 
transform a type-2 problem into a type-1 one. 



4. Exploiting emergent solutions 

In the previous two sections we have considered the case in which it is difficult to 
react appropriately to part of the sensory patterns because they are affected by the 
aliasing problem (i.e. Ae same sensory pattern requires different motor answers) or 
because the regularities present in the sensory pattern are hidden or marginal (i.e. 
groups of sensory patterns that require different answers largely overlap). As we 
showed, both problems can be solved by using sensory-motor coordination. In this 
section we will examine a similar problem and again we will see how it can be solved 
by using sensory-motor coordination. The way in which sensory-motor coordination 
solves this problem, however, seems to be qualitatively different from the two cases 
described above. Indeed, as we will see, this way of solving the problem is effective 
also in cases in which ^ sensory patterns are affected by the alining problem or, in 
other words, cases in which the sensory data do not contain any regularities at all. 

As mentioned above, Nolfi conduct^ an experiment in which a Khepera robot was 
required to discriminate between walls and cylindrical objects by finding and 
remaining close to the latter [10, 11]. The environment was an arena of 60x35 cm 
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surrounded by walls with a cylindrical object with a diameter of about 3 cm randomly 
distributed inside it. To develop the control systan for this robot Nolfi used artificial 
evoluticm. Individuals’ fimess was increased each time step they were close to the 
cylindrical object. After a few generations the best individuals succeeded in ending up 
close to the cylindrical object within 500 cycles most of the limes. This means that 
they were able to discriminate between the two objects, avoiding walls and remaining 
close to the cylinders (see Fig. 2, thick line). 




0 10 20 30 40 50 

g^mtions 

Fig. 2. Thick line: Percentage of times the best individuals of each generation succeed in 
ending up close to the cylindrical object after 500 cycles. Thin line: GSI of the sensory patterns 
experienced by individuals of successive generations. Average results of 10 replications. 

Evolved individuals do not circle around objects (as in the case of the experiment 
described in the previous section). On the other hand, all evolved individuals start to 
move back and forth and/or left and right when they approach the cylinder. This 
emergent behavior can be described as a dynamical system and the relative positions 
with respect to an object in which individu^s start to move back and forth or left and 
right wMe remaining in proximity to the object can be described as an attractor since 
the robot’s trajectory converges on the same relative positions regardless of the 
direction of approach to the target. This can be seen in Fig. 3 which shows the 
trajectory of the movements produced by an evolved individual while approaching 
wiis or cylinders (top and bottom, respectively). As can be seen, when the individual 
reaches a distance of about 20 mm from an object it avoids walls while it continues to 
approach targets until it reaches the attractca area located at a distance of about 15 
mm and an angle of about 45 degrees. The trajectory of the motor responses in this 
area all converge toward the center of the area itself allowing the individual to keep 
more or less the same relative position with respect to the cylinder. 

At this point we may wonder about the role of sensory-motor coordination in this 
type of emergent behavicff. Can we conclude that, as in the experiments (tescribed in 
the previous section, sensory-motor coordination allows individuals to be exposed to 
data in which groups of sensory patterns belonging to different objects (\^ls and 
cylinders in this case) do not strongly overlap? 

One way to answer this question is to look at the GSI index through generations. 
As we can see from Rg. 2 it stabilizes around 0.8 after the very first generation, while 
performance continues to increase through 50 generations. Tte fact that performance 
continues to increase signiftcantiy while GSI stabilizes and the fret that it stabilizes at 
a lower value with respect to the experiments described in the previous section (about 
0.8 instead of 0.9) suggests that in this case the ability to experience sensory patterns 
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belonging to two objects (walls and cylinders in fliis case) that are easy to 
discriminat e plays a less important role. 
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Fig. 3. Angular trajectories of an evolved individual close to a wall (top graph) and to a target 
(bottom graph). The picture was obtained by placing the individual in a random position in the 
environment, leaving it free to interact with the environment for 500 cycles, and recording the 
change in the relative positions with respect to the two objects for distances lower than 40 mm. 
Angle 0 COTresponds to objects in front of the robot. For the sake of clarity arrows are used to 
indicate the relative direction but not the amplitude of the movements. 

Another way to investigate the role of sensory-motor coordination in this experiment 
is to study another case in which the environmental conditions are designed to prevent 
the robot from behaving in such a way that the separability between groups of sensory 
patterns belonging to different objects increases. Consider the case of a simulated 
agent which lives in a circular strip divided into 40 cells (20 cells on tte left and 20 
on the right). At each time step the agent occupies one single cell and perceives the 
sensory state associated with the cell. There are 20 different sensory states that the 
agent can perceive, numbered from 0 to 19, which are each associated with a single 
cell both in the left and in the right part of the environment in a randomly different 
order (see Fig. 4). The agent can react to the current sensory state in two different 
ways (move one cell clockwise or anti-clockwise) and has the goal of reaching and 
remaining in the left part of the environment. 

The agents have a neural network with 20 input units which locally encodes the 
corresponding perceived sensory state and 1 output unit which binarily encodes one 
of the two possible actions. As a consequence only one sensory unit is activated each 
time step. Weights can assume only two values (0 or 1). As a consequence, the weight 
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connecting the input unit corresponding to the current sensory state to the output unit 
determines the motor reaction of the agent at each time step (move clockwise or anti- 
clockwise if the weight is 0 or 1, respectively). Individuals do not have any memory 
of the previously experienced sensory states (i.e. they always react in the same way to 
a given sensory state). 




Fig. 4. Two environments. The numbers represent the sensory state experienced by the agent in 
each cell. Each different sensory state is present once in the left and once in the right part of the 
environment. Arrows indicate the motor reaction of a typical evolved agent. 

What is interesting about this experimental situation is that all possible sensory states 
are affected by the aliasing problem. For each possible sensory state experienced, in 
fact, the animat has a 50% probability of being in the left or in the right part of the 
environment In other words, in this case it is impossible to use sensory-motor 
coordination to select sensory states which are not affected by the aliasing problem or 
sensory states in which groups belonging to different categories (left and right side of 
the environment in this case) do not strongly overlap (in this case, in fact, all 20 
different sensory states are equidistant in the input space). 

Despite this, if we evolve a population of agents by selecting those that ended up in 
the left part of the environment after 200 cycles, after a few generations we obtain 
individuals that are able to move away firom the right part of the environment and to 
remain in the left part^. The way in which evolved individuals solve this problem can 
be seen by observing the arrows in Fig. 4. In the right part of the environment 
individuals consistently move clockwise or anti-clockwise until they abandon the 
right side. Conversely, in some areas of the left side of the environment, individuals 
start to move back and forth, although remaining there for the rest of their life. Note 
that the way an individual reacts to a particular sensory state (for example the fact that 
the individual shown on the left side of Fig. 4 reacts clockwise to the sensory state '3') 
does not have any function in itself. The way in which an evolved individuad reacts to 
a certain sensory state makes sense only if we consider how it reacts to all other 
sensory states. Agents solve the problem by reacting anticlockwise to the upper cells 
and clockwise to the lower cells in the right portion of the environment. This imphes 



^ Evolving individuals were allowed to "live" for 100 epochs, each epoch consisting of 200 
actions. Connection weights were binarily represented in the genotype which was 20 bits 
long. Population size was 100. The best 20 individuals of each generation were allowed to 
reproduce by generating 5 copies of their genotype with 2% of their bits replaced with a new 
randomly selected value. The experiment was replicated 10 times. 
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that there is only a single point on the right side of the environment, no matter where, 
in which two adjacent cells are responded to in different ways. This guarantees that 
the agent quickly leaves the undesired right side and moves to the desired left side. 
When the agent finds itself on the left side of the environment, the different spatial 
distribution of the cells on the left side with respect to the right side with their 
associated sensory states will ensure, with a very high probability, that the agent 
moves within the left side of the environment without ever leaving it. Therefore the 
problem is solved by implicitly exploiting the particular spatial distribution of sensory 
patterns in the environment. 

Note that the way in which these simulated agents solve their task closely 
resembles the strategy adopted by the evolved robots described earlier in this section. 
The envirorunent has only one dimension in this experiment while it has two 
dimensions in the previous experiment (the third dimension is irrelevant given that the 
robot can only move in two dimensions). This explains why agents move clockwise 
or anticlockwise in these experiments while robots moved back and forth and left and 
right in the previous experiments. On the other hand, the type of strategy adopted by 
evolved individuals is the same; react to sensory states to produce attractors (i.e. set of 
motor actions that result in a set of movements that, summed together, allow the 
individual to remain in the same position) in the left part but not in the right part of 
the environment in this experiment; react to sensory states to produce an attractor 
close to cylindrical objects but not close to walls in the previous experiment. 

Obviously this task cannot be solved without exploiting sensory-motor 
coordination (e.g. it is impossible to train a network to produce different outputs for 
sensory patterns belonging to the left and to the right part of the environment). On the 
other hand, it is not possible either to solve this ta^: (1) by using sensory-motor 
coordination to select sensory states not affected by the aliasing problems or (2) by 
selecting sensory states in which groups belonging to different categwies do not 
strongly overlap. This means that there is a third way in which sensory-motor 
coordination can solve hard tasks. This third way rehes on emergent solutions (i.e. 
simple solutions relying on the dynamical interaction between the agent and the 
envirorunent). 



5. Conclusions 

We have seen how sensory-motor coordination (i.e. the ability to select sensory 
patterns that are useful for some purpose through motor actions) can solve hard 
problems in three different ways: (1) by selecting sensory patterns which are not 
affected by the aliasing problem and avoiding those that are; (2) by selecting sensory 
patterns in which groups of patterns requiring different motor responses do not 
strongly overlap; (3) by reacting to each sensory pattern in a way that maximizes 
performance in view of how the agent reacts to all ofter sensory patterns. 

The third way, which we have called emergent behavior, is in a way more radical 
than the other two categories. When we claim that a sensory pattern is affected by the 
aliasing problem or that two groups of sensory patterns that require different 
responses overlap strongly we are implicitly assuming a certain behavioral solution to 
a given task (e.g. we are assuming that the agent should react differently to patterns 
belonging to different objects or to different sides of the envirorunent). However, 
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often tasks can be solved in different ways and only some of these ways may present 
problems such as aliasing or lack of regularities among groups of sensOTy patterns that 
require different answers. Let us consider the last experiment. When we first think 
about this task we assume that the only way to solve it is to react differently to 
sensory states present in the left and in the right part of the environment (at least to 
some of them). When we then realize that all sensory patterns are present on both 
sides and equally distant in the input space we feel that there is no way to solve the 
problem (at least without taking into account previously experienced states). 
However, when we observe the behavior of the evolved agent, we see that there is a 
completely different way to solve the {woblem which does not require reacting 
differently to sensory states lying on the two sides. 

When we leave individuals free to find their own solution to a task by interacting 
with the external environments, two different processes take place. Chi one hand, 
individuals select those strategies that are less affected by the aliasing problem or by 
the lack of regularities within groups of sensory patterns requiring different answers. 
On the other hand, given a certain selected strategy, individuals try to use sensory- 
motor coordination to avoid sensory patterns affected by aliasing problems and to 
experience sensory patterns so to increase regularities within groups of sensory 
patterns requiring Afferent answers. 
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Abstract. This paper is concerned with artificial evolution of neuro- 
controllers with adaptive synapses for autonomous mobile robots. The 
method consists of encoding on the genotype a set of local modification 
rules that synapses obey while the robot freely moves in the environ- 
ment [2]. The synaptic weights are not encoded on the genotype. In the 
experiments presented here, a “behavior-based fitness” function gives 
reproductive advantage to robots that can solve a sequential task. The 
results show that evolutionary adaptive controllers solve the task much 
faster and better than evolutionary standard (non-adaptive) controllers, 
that the method scales up well to large architectures whereas standard 
controllers do not, and that evolved adaptive controllers are not trivial 
and cannot be reduced to a fixed-weight network. 



1 Evolution and Learning 

Artificial evolution of adaptive individuals can provide computational advantages 
and richer adaptive dynamics [1] with respect to evolution of individuals whose 
defining parameters are entirely genetically-determined. Several hypotheses have 
been suggested to explain the observed advantages of the combination of evo- 
lution and learning [7,8,11]. In general, these advantages amount to discovery 
of better solutions for a given problem, to faster convergence, and to improved 
robustness in face of changing fitness landscapes. They are thus relevant for 
artificial evolution of robotic control systems. 

Despite the growing worldwide interest in Evolutionary Robotics, remark- 
ably little work has been done in this direction. A review of the combination 
of evolution and learning for sensory-motor controllers can be found in [5, 10]. 
Most of the work done so far and effectively applied to robots, or realistically 
simulated organisms, shares two components: all synaptic weights are individu- 
ally specified and directly encoded on the genetic string, and learning amounts 
to some standard gradient-descent algorithm. 

In previous work we employed a different approach where synaptic strengths 
are not genetically specified and adaptation during life consists of Hebbian synap- 
tic changes [2-4]. For each synapse, the genetic string encoded four Hebbian 
rules, a learning rate, the sign, and the postsynaptic effect of the travelling 
signal (driving or modulatory). At the beginning of an individual’s “life”, all 
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Fig. 1. A mobile robot equipped with a vision module gains fitness by staying on the 
gray area only when the light is on. The light is normally off, but it can be switched on 
if the robot passes over the black area positioned on the other side of the arena. The 
robot can detect ambient light and the color of the wall, but not the color of the floor. 



synapses were initialized to small random values and, while the robot was freely 
moving around the environment, each synapse could modify its own strength 
every 100 ms according to the genetically specified Hebbian rule. Evolved indi- 
viduals displayed more robust behaviors [2] and consistently won tournaments 
in a competitive co-evolutionary scenario [4]. 

In this paper, we extend previous work by using a much more compact ge- 
netic representation of adaptive neurocontrollers and systematically compare its 
performance with respect to direct encoding of synaptic weights and to encod- 
ing of noisy synapses. In a further set of experiments, we show that compact 
encoding of adaptive networks scales up to large neurocontrollers whereas direct 
encoding fails. Finally, we analyze a family of evolved controllers under differ- 
ent conditions and show that their competitive advantage comes indeed from 
evolved adaptive synapses. 



2 Environment, task, architecture, and genetic encoding 



A mobile robot Khepera equipped with a vision module is positioned in the 
rectangular environment shown in figure 1. A light bulb is attached on one side 
of the environment. This light is normally off, but it can be switched on when the 
robot passes over a black-painted area on the opposite side of the environment. 
A black stripe is painted on the wall over the light-switch area. Each individual 
of the population is tested on the same robot, one at a time, for 500 sensory 
motor cycles, each cycle lasting 100 ms. At the beginning of an individual’s life, 
the robot is positioned at a random position and orientation and the light is off. 
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Fig. 2. The neural controller is a fully-recurrent discrete-time neural network composed 
of 12 neurons giving a total of 12 x 12= 144 synapses (here represented as small 
squares of the unfolded network). 10 sensory neurons receive additional input from one 
corresponding pool of sensors positioned around the body of the robot shown on the 
left (l=left; r=right; f=front; b=back). Iil=Infrared Proximity sensors; L=Ambient 
Light sensors; V =vision photoreceptors. Two motor neurons M do not receive sensory 
input; their activation sets the speed of the wheels (Mi > 0.5 forward rotation; M< < 0.5 
backward rotation) 



The fitness function is described as the number of sensory motor cycles spent 
by the robot on the gray area beneath the light bulb when the light is on divided 
by the total number of cycles available (500). In order to maximize this fitness 
function, the robot should find the light-switch area, go there in order to switch 
the light on, and then move towards the light as soon as possible, and stand on 
the gray area^ . Since this sequence of actions takes time (several sensory motor 
cycles), the fitness of a robot will never be 1.0. Also, a robot that cannot manage 
to complete the entire sequence will be scored with 0.0 fitness. 

A light sensor placed under the robot is used to detect the color of the floor — 
white, gray, or black — and passed to a host computer in order to switch on the 
light bulb and compute fitness values. The color of the floor is not given as input 
to the neural controller. After 500 sensory motor cycles, the light is switched off 
and the robot is repositioned by applying random speeds to the wheels for 5 
seconds. 

The controller is a fully-recurrent discrete-time neural network (figure 2) . It 
has access to three types of sensory information: infrared light (object proximity), 
ambient light, and vision. The active infrared sensors positioned around the robot 

^ Notice that the fitness function does not explicitly reward this sequence of actions, 
but only the final outcome of the over£Jl behavior chosen by the robot. Therefore, 
we call it a behavior-based fitness function. 
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measure the distance from objects (up to 4 cm). Their values are pooled into four 
pairs and the average reading of each pair is passed to a corresponding neuron. 
The same sensors are used to measure ambient light too. These readings are 
pooled into three groups and the average values are passed to the corresponding 
three light neurons. The vision module consists of an array of 64 photoreceptors 
covering a visual field of 36°. The visual field is divided up in three sectors and 
the average value of the photoreceptors (256 gray levels) within each sector is 
passed to the corresponding vision neuron. Two motor neurons are used to set 
the rotation speed of the wheels. Neurons are updated every 100 ms according 
to the following equation 

’"bw) + li, 

where j/j is the activation of the ith neuron, Wij is the strength of the synapse 
between presynaptic neuron j and postsynaptic neuron i, N is the number of 
neurons in the network, 0 < /» < 1 is the corresponding external sensory input, 
and a{x) = (1 + e®)“^ is the sigmoidal function. = 0 for the motor neurons. 

Each synaptic weight Wij can be updated after every sensory-motor cycle 
(100 ms) using one of the four modification rules specified in the genotype.^ The 
four rules are called Hebbian because they are a function of the pre-synaptic ac- 
tivation, of the post-synaptic activation, and of the current value of the weight 
itself. The Plain Hebb rule strengthens the synapse proportionally to the cor- 
related activity of the two neurons. The Postsynaptic rule behaves as the plain 
Hebb rule, but in addition it weakens the synapse when the postsynaptic node is 
active but the presynaptic is not. Conversely, in the Presynaptic rule weakening 
occurs when the presynaptic unit is active but the postsynaptic is not. Finally, 
the Covariance rule strengthens the synapse whenever the difference between 
the activations of the two neurons is less than half their maximum activity, oth- 
erwise the synapse is weakened. Synaptic strength is maintained within a range 
[0, 1] (notice that a synapse cannot change sign) by adding to the modification 
rules a self-limiting component inversely proportional to the synaptic strength 
itself [2, 3, for more details]. 
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Table 1 . Genetic encoding of synaptic pcurameters for Synapse Encoding (left) and 
Node Encoding (right). In the latter case the sign encoded on the first bit is applied to 
all outgoing synapses whereas the properties encoded on the remaining four bits are 
applied to all incoming synapses. A: Genetically determined controllers; B: Adaptive 
synapse controllers; C; Noisy synapse controllers. 



These four rules co-exist within the same network. 
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Two types of genetic (binary) encoding are considered (see table): Synapse 
Encoding and Node Encoding. Synapse Encoding is also known as direct encoding 
[12]. Every synapse is individually coded on 5 bits, the first bit representing its 
sign and the remaining four bits its properties (either the weight strength or its 
adaptive rule). Node Encoding instead codes only the properties of the nodes 
in the network. These properties are then applied to all its incoming synapses 
(consequently, all incoming synapses to a given node have the same properties). 
Each node is characterized by 5 bits, the first bit representing its sign and the 
remaining four bits the properties of its incoming synapses. Synapse Encoding 
allows a detailed definition of the controller, but for a fully connected network 
of N neurons the genetic length is proportional to . Instead Node Encoding 
requires a much shorter genetic length (proportional to N) , but it allows only a 
rough definition of the controller. 

Independently of the type of genetic encoding, the following three types of 
properties can be encoded on the last 4 bits. A) Genetically determined: Weight 
strength. The synaptic strength is genetically determined and cannot be modified 
during “life”. B) Adaptive synapses: Adaptive rule on 2 bits (four rules) and 
learning rate (0.0, 0.3, 0.6, 0.9) on the remaining 2 bits. The synapses are always 
randomly initialized when an individual starts its life and then are free to change 
according to the selected rule. C) Noisy synapses : Weight strength on 2 bits and 
a noise range on the remaining two bits (0.0, ±0.3, ±0.6, ±0.9). The synaptic 
strength is genetically determined at birth, but a random value extracted from 
the noise range is freshly computed and added after each sensory motor cycle. 
A limiting mechanism cuts off sums that exceed the synaptic range [0,1]. This 
latter condition is used as a control condition to check whether the effects of 
Hebbian adaptation amount to random synaptic variability. 

In previous work we always resorted to Synapse Encoding and showed that 
evolution of adaptive synapses for an obstacle avoidance task develops levels 
of performance similar to those obtained by evolution of genetically-determined 
synapses [2-4] . Since in our approach adaptive synapses do not require a specifi- 
cation of initial strength, in this new set of experiments we have employed Node 
Encoding for adaptive synapses and systematically compared it to genetically- 
determined controllers using both Synapse Encoding and Node Encoding. 



3 Experiments 

The experiments have been carried out in simulations sampling sensor activation 
and adding 5% uniform noise to these values [9]. In addition, we have repeated 
the evolutionary experiments for the most significative conditions on the physi- 
cal robot. Since the results on the physical robot do not differ significantly from 
those obtained in simulation, we report them in the appendix. For each exper- 
imental condition, 10 different^ populations of 100 individuals each have been 
independently evolved for 200 generations. Each individual is tested three times 

^ Using different sequences of random number. 
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Fig. 3. Comparison of adaptive synapses with Node Encoding (left) versus genetically- 
determined synapses with Synapse Encoding (center) and Node Encoding (right). 
Thick line=best individual; thin line=population average; dashed line=genetic diver- 
sity. Each data point is an average over 10 replications with different random initial- 
izations. 



and the fitness value is averaged. The 20 best individuals reproduce by making 
5 copies of their genetic string. Strings are crossed over with probability 0.2 and 
mutated with probability 0.05 (per bit). In the case of adaptive synapses, synap- 
tic weights of individuals are randomly initialized within the range [0.0, 0.1] at 
the beginning of each test. 

The fitness results reported in figure 3 show that individuals with adaptive 
synapses and Node Encoding (graph on the left) are much better than individ- 
uals with genetically-determined synapses and Synapse Encoding (graph in the 
center) in that: a) both the fitness of the best individuals and of the population 
report higher values (0.6 against 0.5); b) they reach the best value obtained 
by genetically-determined individuals in less than half generations (40 against 
more than 100); c) they display much less variability across generations. Individ- 
uals evolved with genetically-determined synapses and Node Encoding (graph 
on the right) never managed to complete the task reliably in any of the ten 
replications. The genetic variance'* of the populations of adaptive individuals is 
reduced more markedly than in all other conditions, probably indicating a more 
reliable selection of individuals and preservation of genetic building blocks. 

Two sets of control experiments -one using Synapse Encoding (figure 4, left) 
and the other Node Encoding (figure 4, right)- have been carried out using 
noisy synapses in order to check whether the improvements obtained by evolv- 
ing adaptive synapses were simply due to a random sampling of the fitness 
surface surrounding each individual. In both cases the results were considerably 
worse than those obtained with adaptive synapses (figure 3, left) and than those 
obtained with genetically-determined synapses (figure 3, center). 



* Measured as the average dispersion of individual vectors from the center of mass of 
the population and further normalized by the string length. 
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Fig. 4. Evolution of noisy synapses using Node Encoding [left) and Synapse Encoding 
{right). Thick line— best individual; thin line=population average; dashed line=genetic 
diversity. Each data point is an average over 10 replications with different random 
initializations. 



3.1 Scaling up 

The choice of a neural architecture is often difficult and may affect the outcome 
of an experiment. A large architecture may be computationally more powerful, 
but it may also entail a larger genotype and stronger epistatic effects. Unless one 
knows that a larger search space for the genotype/phenotype mapping considered 
has the same proportion of solutions as a smaller one, shorter genotypes may be 
preferrable because evolutionary search could be faster and more effective. 

We have performed a new series of experiments using a larger neural network. 
The architecture showm in figure 2 was extended by adding 20 hidden neurons. 
These neurons were fully connected to themselves and to other neurons in the 
network, but did not receive sensory input and were not used to set the speeds 
of the wheels. The length of the genetic string grows from 60 to 160 bits for 
Node Encoding and from 720 to 5120 bits for Synapse Encoding. The results 
shown in figure 5 indicate that evolution of adaptive synapse with Node En- 
coding reports fitness values still comparable to the case of a smaller network; 




Fig. 5. Evolution of a large controller with 20 hidden nodes. Left: Adaptive synapses 
with Node Encoding. Right: Genetically-determined synapses with Synapse Encod- 
ing. Thick line=best individual; thin line=population average; dashed line=genetic 
diversity. Each data point is an average over 10 replications with different random 
initializations. 
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Fig. 6. Behaviors of three best individuals with adaptive synapses and Node Encoding 
{left column) and of three best individuals with genetically-determined synapses and 
Synapse Encoding (right column). Individuals belong to the last generation of three 
different replications (randomly chosen out of ten) for each condition. When the light is 
turned on, the trajectory line becomes thick. The corresponding fitness value is printed 
on the top of each box along with the average fitness of the same individual tested ten 
times from different positions and orientations. 



instead, evolution of genetically-determined controllers with Synapse Encoding 
is badly affected in this condition. Evolution of genetically-determined synapses 
with Node Encoding (data not shown) remained closed to zero fitness, whereas 
evolution of synaptic strength and noise range with both Node Encoding and 
Synapse Encoding reported the same results as for the smaller network (data 
not shown). 

The fact that genetically-determined controllers with Synapse Encoding per- 
form badly may indicate that the search space here contains proportionally less 
solutions than the smaller search space of the network pictured in figure 2. The 
slower convergence and slightly lower fitness values of the controller with adap- 
tive synapses (compare with left graph of figure 3) may be explained by the 
increased length of the genetic string, but also by the fact that the architecture 
is fixed and fully connnected. Since in Node Encoding the properties of a node 
propagate to all incoming synapses, there might be a high number of ” parasitic” 
connections that cannot be individually eliminated. We shall come back to this 
point in the final discussion. 



4 Behavioral Analysis 

Figure 6 shows the behaviors of three best individuals evolved with adaptive 
synapses and Node Encoding (left) and with genetically-determined weights and 
Synapse Encoding (right). In both cases individuals aim at the area with the 
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Fig. 7. Disabling adaptation for three best individuals with adaptive synapses (shown 
on the left of figure 6). Left column: Synapses are initialized to random values in the 
range [0.0, 0.1], as during evolution. Center column: Synapses are all initialized to 1.0. 
Right column: Synapses are set to their average value recorded during a full test of the 
individual. The corresponding fitness value is printed on the top of each box along with 
the average fitness of the same individual tested ten times from different positions and 
orientations. The values are always 0.0 because none of the individuals ever manage to 
complete the task under these test conditions. 



light switch® and, once the light is turned on, they move towards the light and 
remain there. The better fitness of the adaptive controllers (given on the top of 
each box, see figure caption) is given by straight and faster trajectories whereas 
genetically-determined individuals display loopy trajectories (and sometimes are 
not capable of standing still on the fitness area, as in the case of the third 
individual on the bottom right of the figure). 

Another set of tests has been carried out to assess the role of adaptation 
in the behavior of the individuals with adaptive synapses. For example, one 
might argue that what matters is the sign of the synapse and not its strength 
as long as it is non-zero, or that adaptive synapses may have the same eflfect of 
fixed synapses with strengths set to their average values®. The same three best 

® Their performance is badly affected if the vision input is disabled, indicating that 
they do not use random search to locate the switch (data not shown). 

® This latter suggestion was made by Flotzinger [6] who replicated our previous ex- 
periments on Adaptive Synapses with Synapse Encoding 
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individuals with adaptive synapses shown in the left column of figure 6 were 
tested again disabling adaptation in three different conditions (figure 7). In the 
first condition the synapses were initialized to small random values in the range 
[0.0, 0.1] (figure 7, left column), as during evolution. In the second condition, the 
weights were all set to their maximum strength 1.0 (figure 7, center column). In 
the third condition the weights were set to their average value (figure 7, right 
column). (The average values had been previously computed while testing the 
robot in adaptive mode and recording the synaptic strength of each connection 
after every update.) For each condition, the three individuals were tested ten 
times from different positions and orientations. None of the individuals ever 
managed to complete the task in any of the three conditions. 



5 Conclusions 

We have shown through a set of systematic comparisons that evolution of adap- 
tive synapses brings a number of advantages with respect to evolution of synap- 
tic weights. It can generate viable controllers in much less generations and the 
evolved controllers display more performant behaviors. Since adaptive synapses 
here need not be specified on the genetic string because their strength is al- 
ways randomly initialized at the beginning of an individual’s test, this approach 
can rely on a very compact genetic encoding that specifies only the adaptive 
properties of individual nodes. Such a compact encoding scales up very well to 
large networks with many synapses. The data obtained from control experiments 
with noisy synapses and from behavioral tests of evolved individuals with adap- 
tation disabled all suggest that Hebbian adaptation plays a specific role in the 
functioning of the controllers both during evolution and during the “life” of an 
individual. 

When describing our controllers with changing synapses, we have accurately 
avoided the term “learning” because we have no evidence that the controller 
acquires new knowledge or skills, or that it may easily acquire new abilities for 
a different task (implementing, for example, something functionally similar to 
reinforcement learning). However, we have used the term “adaptation” because 
synapses change according to the states of the sensors and of the other neurons 
in the controller of the robot. In other words they adapt their initial random 
configuration to a dynamically-stable configuration that depends on the behavior 
of the robot. The adaptation rules are genetically specified and have evolved to 
satisfy a specific fitness function. As they stand, our results indicate that we 
have developed a smart genetic specification of neural controllers suitable for 
evolution. One of our current projects aims at establishing to what extent can 
this approach scale up to more complex behavioral tasks and to other more 
traditional learning problems. Another project aims at testing (and possibly 
extending) this approach for behavioral problems where learning is traditionally 
considered necessary. 

We think that evolution of adaptive synapses may be very suitable for evolv- 
ing neural morphologies where one cannot specify the strength of individual 
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Fig. 8. Comparison of adaptive synapses with Node Encoding (left) versus genetically- 
determined synapses with Synapse Encoding {right) for experiments carried out on the 
physical robot. 



synapses on the genotype and at the same time wishes to keep the genetic string 
as compact as possible. The methods proposed so far for evolution of morpholo- 
gies all need very complex genetic encoding, require much domain-specific knowl- 
edge (e.g., symmetries, connectivity types), and have not yet been shown to be 
competitive with direct-coding methods. The Node Encoding scheme that we 
have proposed may be a first step in the direction of morphology evolution in 
the sense that synapse details are not specified in the genetic code, but are taken 
care of by adaptive online rules. In a current project we are extending our ap- 
proach by adding genes for expression of connection growth and recursive rules 
to the node specification. 

Appendix: Evolution on the physical robot 

Two sets of experiments have been repeated on the physical robot: adaptive 
synapses with Node Encoding and genetically-determined synapses with Synapse 
Encoding (figure 8). The main differences from those carried out in simulations 
are: the population size is 80, each run lasts 40 generations, only one run has 
been carried out for each condition, and each individual is tested only once in 
the environment. The last restriction means that the effects of chance are more 
marked on the performance and are the cause of the higher oscillation observed, 
especially for the individual with genetically-determined synapses. These data 
should be compared to those shown in the graphs at the left and center of 
figure 3. The performances obtained with the physical robots are better than 
those obtained in simulation because the latter include severe constraints. For 
example, when a simulated robot pushes against a wall, it cannot move unless it 
backs away; instead, real robots can often get away by sliding against the walls. 
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Abstract. This paper briefly reviews synthetic approaches to neurobi- 
ology and presents results of two experiments on the use of evolutionary 
algorithms for the design of neural controllers for locomotion. The first 
experiment consists in using the evolutionary algorithm for instantiat- 
ing low level parameters of a connectionist simulation of the lamprey’s 
locomotor circuitry. The second experiment develops potential neural cir- 
cuits for the swimming and trotting of the salamander; an animal whose 
locomotor circuitry has currently not been decoded. In both cases, bio- 
logically plausible control circuits cire developed which produce a neural 
activity with many similarities to that measured in the real animals. 



1 Synthetic approaches to neurobiology 

The fields of artificial life and artificial intelligence have developed tools and 
methods which have the potential to significantly help computational neurobiol- 
ogy. Synthetic approaches to neurobiology can indeed increase our understanding 
of the central nervous system, and this at two levels. 

At a high — behavioural — level, fields such as computational neuroethol- 
ogy [1, 2], or also synthetic psychology [3], investigate how behaviour results 
from neural circuits through the development of neural controllers for artificial 
animats (robots or simulations). Models of escape and feeding behaviours in 
frog [4], insect locomotion [1, 5], fly vision [6, 7], cricket phonotaxis [8], classi- 
cal conditioning [9] have, for instance, been simulated and/or implemented in 
real robots. These studies investigate hypotheses on central nervous systems by 
embedding neural models into bodies (simulated or real) in interaction with an 
environment. An interesting aspect of these investigations, compared to more tra- 
ditional computational neurobiology, is therefore that they test the completeness 
of a model, that is, they verify whether all elements necessary for the production 
of an observed behaviour have been taken in account. They are also useful for 
analysing the effect of having a real body in terms of sensory feedback and body 
dynamics. Finally, their synthetic essence, i.e. the fact that, although biolog- 
ically plausible, the developed neural models do not necessarily correspond to 

** A large part of this work was carried out while the author was at the University of 
Edinburgh in the Department of Artificial Intelligence. 
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existing mechanisms, is interesting for investigating possible control mechanisms 
and, potentially, inspiring new neurobiological measurements. 

At a lower level, techniques from artificial neural networks can be used as 
tools for completing neurobiological models. Backpropagation algorithms have 
been used for instantiating synaptic weights of a connectionist model of the 
escape reflex in a leech [10], and the locomotor circuit of the stick insect [5], 
for instance. More recently, evolutionary algorithms are being used for setting 
parameters of compartmental models of single neurons [11], or for defining synap- 
tic weights in a model of the salamander’s visual system [12]. The interesting 
outcome of these approaches is the development of tools which automatically in- 
stantiate multiple parameters of complex non-linear systems modelling biological 
circuits, given a description of their observed output. 

We will next present two experiments in which a genetic algorithm is used 
for generating part of control circuits for anguiliform locomotion. In the first 
experiment, the genetic algorithm is used to instantiate synaptic weights of a 
neural circuit whose general structure is well known — the locomotor circuitry 
of the lamprey — while in the second experiment it is used for generating po- 
tential neural controllers for the locomotion of the salamander, an animal whose 
locomotor circuitry has not been decoded for the moment. 

2 Design of the lamprey’s swimming controller 

2.1 Ekeberg’s connectionist model 

The lamprey — one of the earliest vertebrates— swims using an anguiliform 
swimming gait, i.e. by propagating a travelling undulation from head to tail. Its 
locomotor circuitry has been studied in detail by neurobiologists (see [13] for a 
review), and is known to be a central pattern generator (CPG) made of a chain 
of approximately 100 segmental oscillators located in the spinal cord (Figure 1). 

Several models of that circuitry have been developed, and this research is 
based, in particular, on the connectionist model developed by Ekeberg [14]. 
That model simulates the complete 100-segment CPG of the lamprey organ- 
ised as illustrated in Figure 1. It is composed of neuron units modelled as leaky 
integrators with a saturating transfer function which represent populations of 
functionally similar neurons in the real lamprey. The output u of a neuron unit 
corresponds to the mean firing frequency of the population it represents (e [0, 1]) 
and is calculated as follows: 

^ liilUi (2) 

•d — — (u — 1?) (3) 

ta 

J 1 - exp{(6> - ^+)r} - ~ /it? (u > 0) 

\ 0 (n < 0) 



U = 



(4) 
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Fig. 1. Lamprey’s swimming controller. The controller is made of 100 interconnected 
segmental oscillators (only 4 segments shown) composed of 8 neurons each. Four types 
of neurons are present in the oscillators; 3 types of interneurons EIN, CIN and LIN and 
the motoneurons MN. Sensory feedback is provided by stretch sensitive edge cells EC. 
The dashed lines indicate the projections from segmental connections to neighbouring 
segments. 



where Wi are the synaptic weights, and !?’_ represent the groups of pre- 
synaptic excitatory and inhibitory neurons respectively, and are the de- 
layed ‘reactions’ to excitatory and inhibitory input, and t? represents the fre- 
quency adaptation observed in some real neurons. 

The model is able to produce the following behaviours observed in the real 
lamprey; 1) when excitation is applied to the neurons of the different segmental 
oscillators, the segmental circuits develop an oscillatory activity with a frequency 
proportional to the level of excitation; 2) applying extra excitation to segments 
closest to the head leads the system to oscillate with small phase lags between 
segments which are constant over the spinal cord, therefore producing the typical 
wave of neural activity observed in anguiliform swimming; 3) for a given level of 
extra excitation, the wavelength of the undulation is independent of the oscilla- 
tion frequency. Furthermore, when the motoneuron signals are used to determine 
the muscular activity of the simple mechanical simulation of the lamprey that 
Ekeberg developed, a swimming gait is produced which is very similar to that 
of real lampreys. 



2.2 Parameter instantiation using a genetic algorithm 

The implementation of a model such as Ekeberg’s requires a significant amount 
of time for the setting of a large number of parameters, including the neuron 
parameters and the synaptic weights of all connections (Ekeberg, personal com- 
munication). We will here present hoiv a genetic algorithm can be used as a tool 
for automatically instantiating those parameters, given a description of the de- 
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sired behaviour of the system. This experiment follows evolutions of “artificial” 
controllers for swimming [15], i.e. controllers without the lamprey’s connectivity. 
For a more detailed description of the results, see [16]. 

The evolved controllers are composed of the same type of neurons as those 
of Ekeberg and their connectivity corresponds to that observed in the lamprey. 
The design process is made in three stages, with first the development of seg- 
mental oscillators, then the development of intersegmental coupling and finally 
the development of sensory feedback connections from stretch sensitive cells. 

Genetic algorithm. The same real number genetic algorithm is used for the 
three stages. Genes are real numbers between 0.0 and 1.0 which directly encode 
parameters of the neural controller (see below). Parents chromosomes are chosen 
with a rank-based probability, and children chromosomes are created with a 2- 
point crossover and a mutation operator. Mutation consists of modifying the old 
gene value by a small random value. 

Stage 1: segmental oscillators. In this stage, the synaptic weights of the 26 
connections within one segment are evolved. Because a left-right symmetry is 
assumed, chromosomes have 13 genes, which directly encode a synaptic weight 
through a linear transformation. The fitness function is defined to reward solu- 
tions which 1) produce regular motoneuron oscillations, and 2) have a frequency 
and an amplitude of oscillations which increase with the level of external exci- 
tation (for the mathematical definition of the function see [16]). 

Ten runs were carried out with populations of 100 chromosomes for 500 
generations. All populations converged to best solutions oscillating regularly and 
covering a large range of frequencies. Interestingly, the range of frequencies of 
the evolved oscillators (e.g. from 0.9 to 11.0 Hz) is much closer to that observed 
in the real lamprey (from 0.25 to 10.0 Hz) than Ekeberg’s segmental oscillator 
(from 1.7 to 5.6 Hz). 

Stage 2: intersegmental coupling. The second stage consists of developing 
the coupling connections between segmental oscillators. In the lamprey, oscil- 
lators are coupled through projections of segmental connections towards neigh- 
bouring segments. The extent of the projections are currently not known in 
detail, and for that reason Ekeberg chose a simplified coupling in which all seg- 
mental connections project symmetrically in the rostral and caudal directions 
except for the connections from the CIN neurons which project more caudally. 

Here the GA is used to investigate potential coupling configurations between 
100 copies of a chosen segmental oscillator. The chromosome encodes the extent 
of the projections of each segmental connection for both the rostral and caudal 
direction. The fitness function is defined to reward solutions which are able 1) to 
produce regular oscillations of the motoneurons in all segments, 2) to produce 
a travelling wave whose wavelength can be modulated by the extra excitation 
applied to the segments closest to the hecid, and 3) to produce swimming gaits 
covering a large range of speeds. 
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Fig. 2. Top: swimming gait produced by the optimised lamprey’s controller. Bottom: 
Corresponding neural activity in the 50th segmental oscillator (left) and in the mo- 
toneurons along the left side of the spinal cord (right). 



Five runs were realised with populations of 40 chromosomes for 100 gener- 
ations. All five runs converged to controllers with similar performances to Eke- 
berg’s biological model, in particular they cover larger ranges of lags and can 
reach slightly higher speeds. The range of lags of the best solution, for instance, 
varies between 0.0 and 3.5% of the oscillation period, which corresponds to the 
lags observed in the real lamprey (up to 3.0%) when local concentrations of ex- 
citatory bathes are varied [17]. Figure 2 illustrates the swimming gait produced 
by one of the evolved controllers. 



Stage 3: sensory feedback from stretch sensitive cells. The last evo- 
lutionary stage consists of evolving the synaptic weights of sensory feedback 
connections from stretch sensitive cells. The lamprey has a series of inhibitory 
and excitatory stretch sensitive cells — the edge cells— located on both sides of 
the body which project to the segmental oscillators [18]. In [19], Ekeberg demon- 
strated that these cells could be useful for crossing a speed barrier (a local area 
with an increase of the speed of the water). 

In order to further investigate how sensory feedback could be best used by 
the swimming CPG, the GA was used to generate weights for these feedback 
connections, given a fitness function rewarding the capacity to progress against 
the speed barrier with as small deviation as possible. 

In all 5 runs tested (populations of 100 chromosomes, 100 generations), con- 
trollers were generated capable of crossing the chosen speed barrier (15 cm wide 
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with a speed 40% higher than the lamprey’s swimming speed) . Interestingly, the 
evolved sensory feedback pathways correspond very closely to those observed 
in [18]: for all established (inhib. and excit.) biological connections, the evolved 
controllers have developed sensory feedback connections with the same sign. 

3 Design of the salamander’s locomotor controller 

This second experiment concerns the salamander, an animal whose locomotor cir- 
cuitry has not been decoded for the moment. The aim of the synthetic approach 
is here to investigate which kind of neural circuits can produce the observed 
gaits of the salamander. 



3.1 Neurobiology of the salamander’s locomotor circuitry 

A salamander swims like a lamprey, and on ground it switches to a trotting 
gait with the body producing a standing wave coordinated with the movements 
of the limbs [20]. It has been hypothesised that the neural circuitry capable 
of producing both the travelling and the standing wave is based on a similar 
organisation to that of the lamprey [21, 22]. 



Trunk 



Tail 




Extensor 

Flexor 



Extensor 

Flexor 




Anterior limb 
and trunk 



Posterior limb 
and tail 



Fig. 4. Organisation of the evolved controllers for the salamander. 
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3.2 Evolution of potential locomotor controllers for the salamander 

Following that assumption, we use a genetic algorithm to generate synaptic 
weights of a controller made of a lamprey-like body CPG and two limb oscilla- 
tors which are copies of the body’s segmental oscillators (Figure 4). The limb 
oscillators project to the motoneurons of the limbs and to the segmental oscil- 
lators of the body CPG, creating an unilateral coupling between them and the 
body CPG. ^ A simple 2D mechanical simulation of a salamander-like animat is 
developed for testing the swimming and trotting gaits (Figure 3, see [23] for a 
more detailed description). The aim is to be able to switch between the swim- 
ming and the trotting gaits by applying external excitation either to the body 
CPG or to both the body and the limb CPGs, respectively. 




Fig. 5. Trotting (left) and swinaming (right) salamander. 




Fig. 6. Neural activity during trotting. Left: Neural activity in the limb oscillators (Ma 
and Mp represent the motoneuron activity of body segments 5 and 95, respectively). 
Right: Motoneuron cictivity along the left side of the body. 



Chromosomes encode the synaptic weights of all possible connections from 
the two limb oscillators, as well as the connections from the brain stem to the limb 
motoneurons. The fitness function is defined to reward solutions which 1) trot 
as fast as possible, 2) can cover a large range of speeds when the excitation 
applied to both the body and the limb CPGs is varied, and 3) can change the 
direction of motion when left-right asymmetrical excitation is applied. The same 
real number genetic algorithm as for the experiment on the lamprey is used. 

' This configuration is more biologically plausible than the one used in initial ex- 
periments in which there was no coupling between the swimming Eind the trotting 
CPGs [23]. 
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Ten evolutions with populations of 100 chromosomes are carried out for 50 
generations. All but 3 evolutions converged to controllers exhibiting a trotting 
gait with a trunk-limb coordination very similar to the real salamander (Figure 5, 
left). The speed of the trotting can be increased by increasing the amount of 
excitation, and applying a small asymmetry of excitation between the left and 
right sides of the CPGs leads to the salamander trotting in a circle. Finally, a 
lamprey-like swimming gait can be produced when excitation is applied only to 
the body CFG (Figure 5, right). 

During trotting, the effect of the unilateral coupling from the limb oscillators 
on the body GPG is to force the anterior and posterior part of the body to 
oscillate in antiphase (Figure 6, right). Interestingly, the timing of the flexor and 
extensor limb motoneurons compared to the body motoneurons is very similar 
to that measured in the real salamander [22]. 

4 Discussion 

These two experiments illustrated how a genetic algorithm could be used as a 
tool for neurobiological modelling. The interesting features of the method are: 

1. GAs allow automatic instantiation of multiple parameters in complex non- 
linear models of central nervous systems. The evolution of controllers for the 
lamprey illustrate, for instance, that the GA can generate a significant part 
of the model that Ekeberg has designed by hand. 

2. Specific characteristics specified by the user can be optimised. It was, for 
instance, possible to optimise the frequency range of Ekeberg’s model and 
to obtain a better fit of biological data. 

3. As illustrated with the salamander, the GA can also be used to investigate 
potential control mechanisms for biological systems whose structure is not 
known for the moment. 

Compared to more traditional learning algorithms for artificial neural net- 
works, such as variations of the backpropagation algorithm, for instance, a GA 
has the advantage that the fitness function does not need to be differentiable and 
that the desired output of the system can be described at a higher level. There 
is no need to provide a specific output cycle that the network should learn, and 
the desired behaviour of the system can, for example, be described in terms of 
a desired range of frequencies or the capacity of fast swimming. Note that the 
GA is here not used as a simulation of natural evolution, and that, similarly to 
[24, 25, 26], the staged evolution approach taken here rather corresponds to an 
“engineering” approach to artificial life. 

It is hoped that, in the case of the salamander, this synthetic approach can 
provide news ideas for neurobiological measurements, and that a back and forth 
processus between modelling and measurements on the real animal will lead to 
a progressive improvement of the model by incorporating new neurobiological 
data when it becomes available. 

Finally, the types of developed connectionist controllers may also be useful 
to robots using animal-like locomotion. The neural controllers are capable of 
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transforming simple commands into the multiple rhythmic signals sent to the 
different actuators for efficient locomotion. They present the interesting property 
that by simply varying the amplitude of the commands, the speed, direction and 
type of locomotion can be modulated. 

5 Conclusion 

This paper briefly reviewed synthetic approaches to neurobiology and presented 
two experiments in the use of genetic algorithms for designing connectionist 
models for anguiliform locomotion. In these experiments, the genetic algorithm 
is used for instantiating synaptic weights of neural circuits whose structure cor- 
responds to that decoded (for the lamprey) or hypothesised (for the salamander) 
in the real animal. It is found that 1) the GA is successful in automatically in- 
stantiating variables which require a long time to be set by hand, and 2) it can 
generate solutions which optimise high level characteristics specified by the user 
such as the speed of locomotion of a mechanical simulation. 
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Abstract. To be usefirl in psychology "artificial organisms" have to perform 
tasks comparable to those performed by animals. One way to achieve this is to 
replicate actual animal experiments. Here we reproduce an experiment showing 
"detour behavior" in chicks - a behavior usually explained in terms of "cogni- 
tive maps" or other forms of internal representation. We artificially evolve 
software-simulated robots with a "generic" ability to detour. Sensor-motor 
physics are carefully calibrated with data from a physical robot. Robot archi- 
tecture is constrained to exclude internal representation. The evolutionary proc- 
ess rewards exploratory skills as well as detour behavior. Robot performance 
matches the results achieved in the original experiment. This proves that inter- 
nal representations are not a necessary condition for primitive detour behavior 
and suggests that "detouring" evolves naturally from simpler behaviors. Future 
research will show whether it is possible to evolve more complex detour abili- 
ties using a similar bottom-up strategy. 



1 Introduction 

Experimental animal psychology is based on the study of animals performing well- 
defined tasks in closely controlled conditions. Psychologists use descriptions of be- 
havior to infer animals’ cognitive abilities, inputs and outputs, algorithms and internal 
representations [1], Comparative studies provide insight into general mechanisms of 
perception and cognition. This mature tradition has produced a vast volume of reli- 
able data and well-tried methodologies which pose a severe challenge for A-life re- 
searchers. An artificial organism, to be useful, has to perform tasks comparable to 
those that animals perform in the laboratory or the wild [2]: it has to acquire input 
from a noisy environment using imperfect sensors; motor mechanisms have to be 
based on realistic physics. Last but not least, the behavior of the organism has to be 
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measured and recorded with the precision animal psychologists have come to de- 
mand. One way to achieve these goals is to replicate experiments from the past, no 
longer with animals but with robots produced by artificial evolution. In this paper we 
apply this approach to so-called "detour behavior” - the ability of an animal to nego- 
tiate an obstacle to reach a target. 



2 Related work 



2.1 A-life and Evolutionary robotics as a tool in psychological research 

In 1984 V. Braitenberg suggested that it was possible to gain insight into problems of 
sensory-motor coordination by designing robots exhibiting specific forms of behavior 
[3]. In recent years a number of researchers began to build robots deliberately de- 
signed to test hypotheses in cognitive science. Robots have been built with the ability 
to simulate insect locomotion [4], to orient towards mating calls from female crickets 
[5] and to model theoretical models of animal navigation [6]. 

The challenge of designing "embodied" robots capable of operating in complex envi- 
ronments has been a source of important insights. Psychologists and computer scien- 
tists have come to recognize the difficulty of the tasks faced by autonomous agents 
and the unimagined simplicity of some of the possible solutions [7] [8]. 

While one school of researchers was developing “adaptive robotics” a second school 
has worked on the "artificial evolution" of software-based "animats". This approach 
avoids the limitations associated with manual design. Animats have been "evolved" 
with a broad range of "interesting" behaviors, such as pursuit and evasion [9]. There 
have however been charges that animats fail to address the complex problems inher- 
ent in the design of physical motor-sensory systems [10]. 

One way of combining the strengths of adaptive robots and "artificial evolution" is to 
build simulations which effectively reproduce the behavior of a physical robot. In our 
group [11] [12], we have developed a methodology for calibrating simulations with 
data from physical robots. In recent work we have “bred” robots which exhibit effi- 
cient exploratory behavior [13] Another theme of recurrent interest has been so-called 
"detour behavior" [14]. 



2.2 Detour behavior and its interpretations 

When an animal seeks to reach a target it often meets an obstacle. The only way to 
reach the target is to take an indirect route during which it looses visual contact with 
the target. This is known as "detour behavior". Detour behavior has been demon- 
strated in many animals including chimpanzees [15], rats [16] and two day old chicks 
[17]. 
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For Tolman and Honzik [18] detour behavior is incompatible with behaviorist theory. 
If there is no stimulus the animal's behavior cannot be a response. Detour behavior, 
they argue, requires "cognitive maps" of spatial relationships. As Bennett has pointed 
out, however [19], every experiment which has been explained in terms of "cognitive 
maps" can also be interpreted in different, often simpler, ways. The neurological 
evidence is inconclusive. Recording of single neuron activity in the hippocampus 
appears to indicate the existence of "place cells" which fire when an experimental 
animal is in a given location [20]. This finding is supported by evidence that hippo- 
campal lesions disrupt landmark-based navigation [21]. Recently however it has been 
shown that neurons thought to be "place cells" may also respond to non-locational 
data [22]. 



2.3 Detour behavior in 2 day old chicks 

The work reported in this paper replicates experiments by Regolin et al. [19] which 
have demonstrated the existence of detour behavior in two day old chicks. In these 
experiments the chicks are placed in a white cage divided in two by a barrier (see 
Figure 1). The part of the cage on the other side of the barrier contains a corridor. On 
the end of the corridor facing the target there is a grill through which the chicks can 
see the target. On each side of the corridor there are apertures leading into two com- 
partments, one facing the target, and one facing in the opposite direction. The two 
compartments facing the target are labeled C and D.; the two compartments facing in 
the other direction are labeled A and B. 




Figure 1 ; Diagram of apparatus for chick experiment 

At the beginning of each session a chick is placed in the corridor close to the barrier 
and allowed to wander freely. The experimenter records the first compartment the 
chick enters and the time it takes to reach it. After 10 minutes the experiment is 
halted. 
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The results of the experiment show that of 25 chicks tested 5 failed to leave the corri- 
dor within the allotted time. Of the remaining 20 animals 18 chose compartments C 
or D; two choose compartments A and B. The excess of birds choosing the correct 
compartment was statistically significant {chi^ =12.80, df=l, p<0.001). There was no 
significant difference between the number of chicks entering compartment C and the 
number entering compartment D. These results demonstrate that the chicks were able 
to turn towards the goal in the absence of locally orienting clues. This, it is argued, 
shows the ability to maintain a representation of the goal after the loss of perceptual 
contact. Additional experiments (not replicated in this paper) showed that on repeated 
trials successful chicks would sometimes choose compartment C and sometimes 
compartment D. This seems to show that the birds "did not learn a fixed response, 
(i.e. turn left or right) but a position in space in egocentric coordinates" (i.e. turn left 
or right depending on the previous'direction of turn). 



3 Objectives and methods 

The goal of the work reported in this paper was to "evolve" robots with the ability to 
replicate the behavior of the chicks yet whose architecture categorically excluded the 
presence of “cognitive maps” or other forms of internal representation. An analysis of 
the strategies, sensors and architectures used by successful robots, would lead, we 
hoped, to interesting explanatory hypotheses for detour behavior. 

The experiments used a genetic algorithm operating on a population of 100 robots, 
simulated in software. The simulation software was designed to precisely emulate the 
well-known Khepera robot [23]. Input to the robot came from 8 infrared proximity 
sensors, 4 sensors linked to a linear video-camera and 3 "time sensors". The use of 
time sensors was motivated by previous work in which we showed that such sensors 
improve the efficiency of exploratory behavior [13]. Proximity sensors have a sen- 
sory field of 20° and are sensitive to obstacles within 3 cm of the sensor. Output 
(between 0 and 1) is a continuous, decreasing function of distance to the obstacle. 
The video-camera has a field of vision of 36°. Each sensors produces an output of 1 if 
the center of the target lies within its own 9° field of vision. The output values of the 
three time sensors were initially set to zero, increasing respectively by 0.01, 0.02 and 
0.03 on each cycle of computation. When a sensor value reached 1 it was reset to 
zero. 

The motor apparatus consisted of a left and a right wheel driven by stepping motors 
which can move both forwards and backwards. The motor apparatus was controlled 
by an Artificial Neural Network (ANN) with input neurons representing the state of 
the sensors and output neurons controlling the stepping motors. A number of different 
architectures were tested. The architecture finally chosen was based on a simple Per- 
ceptron [24] in which all sensors have a direct connection to the two output units. 
Evolution involved "mutations" in connection strengths. The genome of the organism 
consisted of a sequence of binary coded numbers (8 bits per number) representing the 
strengths of individual connections. 
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We evolved "detour behavior" by applying "artificial selection" to the population of 
ANNs. In the initial population connection strengths for individual networks were set 
to random real values uniformly distributed between -1 and 1. 

In preliminary work it became clear that it would never be possible to evolve detour- 
capable robots without rewarding intermediate steps in the evolutionary process as 
well as the ultimate goal. We hypothesized that detour behavior might be derived 
from other, more primitive forms of exploration and food-seeking e.g. the ability to 
move towards a visible target, to negotiate an obstacle and to efficiently search an 
open space. We therefore designed fitness formulae and evaluation protocols so as to 
reward these abilities individually even when they did not lead to successful detours. 
Fitness was recalculated on each cycle of computation, using the following algorithm: 

IF { "some infrared sensor" > 0 && old_position >< 

new_position) fitness ++ 

IF ( "some infrared sensor" > 0 && old_position = 

new_posion) fitness -- 

IF (distanceToTarget< 15 cm.) fitness += 10 

The first two components in the fitness formula were designed to encourage obstacle 
avoidance; the last component rewarded the robot when it approached the target. 

The authors wished to simulate natural evolution and to avoid the emergence of ad- 
aptations specific to a particular experimental setting. To achieve this goal robots 
were evaluated in four different environments (see Figure 3). Each environment con- 
sisted of an open field with no external fence. In the first environment there was no 
obstacle between the robot and the target. The fitness formula rewarded robots which 
successfully searched for the target and moved towards it. The second, third and 
fourth environments selected for actual detour behavior. In the second environment 
the target was placed behind a linear obstacle 80 cm long. In the third environment 
Khepera was placed inside an 10*80 cm corridor. The fourth environment used a 
40*70 cm U-shaped obstacle. Obstacles were 3 cm high and of negligible thickness; 
the target was 12 cm high. It follows that in the "evaluation sessions" the robot was 
always able to "see" the target even when the path to the target was obstructed by an 
obstacle. The evaluation test was repeated five times for each environment. At the 
beginning of each cycle the robot was placed in a randomly chosen position 90 cm 
from the target. The heading was chosen randomly from a uniform distribution. Each 
test consisted of 600 cycles of computation. 

When all robots had been tested individual robot fitness scores were summed over 
each of the five tests in each of the four test environments. The 20 robots with the 
highest overall score were selected for "reproduction". Each of the selected robots 
produced 5 offspring. Reproduction was asexual. During the cloning process "muta- 
tions" were introduced by flipping bits in the genome with a probability of 0.02 per 
bit per generation. This process was iterated for 350 generations. Each simulation was 
repeated six times using a different random number seed on each occasion. 
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Finally, the 4 best robots produced in the last generaration of each of the simulations 
were tested in a replica of Regolin et al.’s experimental apparatus. It is important to 
note than in this setting the obstacle completely obstructed the robot's view of the 
target. As in Regolin’s work, the results of the experiment were given by the number 
of robots choosing the correct compartments within a pre-determined duration (600 
cycles of computation) 



4 Results 

4.1 Evolution of generic detour behavior 
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Fisure 2: Fitne.ss scores averaged over the 6 simulations. 






Legend: Small circles: terminal points 
for trajectories (if within field box). 
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Figure 3: Typical trajectories followed by robots in the 
four training environments 
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Figure 2 shows the score achieved by the fittest robot in the population, the mean 
score achieved by robots selected for reproduction and the mean score for the whole 
population. As can be seen the fitness score for the best organism in the population 
initially increased rapidly, rising from 2640 in generation 0 to 6500 in generation 50. 
Between generation 50 and generation 180 fitness levels oscillated around a station- 
ary level of approximately 7000. From generation 180 to the end of the simulation 
fitness scores again rose, reaching a level of 14,000 in generation 350. A qualitative 
examination of the trajectories followed by individual robots show that by the end of 
the evolutionary process all robots selected for reproduction were exhibiting satis- 
factory detour behavior in all four environments (see Figure 3) 



4.2 Performance in the experimental setting 

Table 1 compares the performance of the 24 robots with the chicks in Regolin et al.'s 
experiment.. As can be seen from the table 1/24 of our robots failed to enter one of 
the compartments within the allotted time (5/25 in the original experiment). Of the 
remaining robots 22 entered the correct compartment (20 in the original work) and 3 
chose the wrong compartment. As is the original experiment there was no significant 
difference between the number of robots entering compartment C and the number 
entering compartment D. 

Table 1. Robots and chicks performances in the experimental setting 





Do not 
leave the 
corridor 


Sector A 


Sector B 


Sector C 


Sector D 


Total 


Chicks 


5 


2 


3 


9 


11 


25 


Robots 


1 


0 


2 


11 


11 


24 



There is no statistically significant difference between the results achieved in our 
experiment and those reported in Regolin et al. We therefore conclude that our simu- 
lation successfully replicates the results of the original experiment. 

An examination of individual trajectories shows few differences among individuals; 
the trajectories follow'ed by robots chosing Sector C are a rough mirror-image of 
those used to reach Sector D. A typical trajectory is shown in Figure 4. 
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Figure 4; A typical robot trajectory in the exjrerimental apparatus 



5 Discussion 

In our work every neuron in the ANN directly represented either an input or an out- 
put; there were no hidden neurons. In brief the robots we evolved had no access either 
to maps or to any other internal representation of location. 

In different simulations robots evolved different strategies. We have not as yet per- 
formed a detailed analysis either of the strategies themselves or of the underlying 
computational mechanisms In general however they appear to be based on simple 
rules, for example (see Figure 3.): 

1) Main: Move forward turning first clockwise 
(slowly) and then anti-clockwise (more rap- 
idly) . 

2) Taxis: On visual contact with target turn 
sharply towards the target . Return to Main 

3) Wall following: If left proximity sensors ac- 
tive turn right. If right sensors active turn 
left. Move forwards until obstacle out of 
view. Return to Main 

Input from the time sensors allows the robot to generate differentiated responses to 
identical external stimuli. It is the time sensors which make it possible for the robot to 
change its direction of motion in the absence of any external stimulus. The combina- 
tion of exploration and taxis, which these sensors make possible, represents an effi- 
cient strategy for moving towards the target even when it is often outside the cam- 
era’s field of vision. It is this strategy (rather than an explicit representation of target 
position) which enables the robots to perform successfully in the experimental setting. 
The observation that artificially evolved robots can generate a particular behavior 
using simple behavioral rules does not demonstrate that chicks use the same rules or 
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that that cognitive maps or target representations do not exist. There is, in fact, at least 
some evidence that chicks and robots use different rules. Individual chicks are re- 
ported to have no particular preference for left over right turns; individual simulation 
runs on the other hand produce asymmetrical strategies (though the asymmetry may 
be reversed from one simulation to the next). 

While our work does show is that it is possible to generate, at least the simplest forms 
of detour behavior, without resort to internal representations. It should be added that 
feed-forward control networks like those used in our experiments have no internal 
representation of past states. They cannot, in other words, follow the rule Regolin et 
al. suggest for their chicks, turning left or right depending on their previous direction 
of turn. 

Our work suggests alternative models for detour behavior and the behavioral primi- 
tives on which this behavior is likely to depend. The ability to efficiently explore an 
environment, to locate food, to move towards the food and to negotiate obstacles, are 
of fundamental importance for a broad range of animal species. The results of our 
experiment suggest that primitive detour behavior may, in fact, be a relatively simple 
extension of these basic exploratory capabilities. What they seem to show is that it is 
possible to generate primitive detour behavior on the basis of nothing more than: (a) a 
robot's ability to move towards a target, (b) a strategy guaranteeing that it will never 
permanently loose visual contact and (c) a simple “wall following” routine. 

None of this implies, of course, that all forms of detour behavior can be explained so 
simply. The evidence in favor of cognitive maps is still, at this stage of the game, 
relatively convincing. And yet a doubt remains. If elementary detour behavior can 
evolve, step by step, from simple behavioral primitives, might it not be possible to 
evolve more complex detour abilities using a similar bottom-up approach? This is an 
issue which we will address in future experimental work. 
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Abstract. In this paper we introduce the notion of historical evidence 
- the ability to replicate biologically realistic evolutionary scenarios - 
for hypothesised mechanisms for control of sensorimotor behaviour. We 
apply the idea to the phonotaxis mechanism proposed by Webb and her 
collaborators to account for the abilities of Gryllus himaculatus. To do 
this, we tested whether the proposed control mechanism, when imple- 
mented in a robot model of the animal, could account for evolutionary 
adaptations observed in the natural system. We describe and discuss 
the experiment and its results, but start by explaining the methodology 
used, which is an extension of Webb’s existing methodology for obtaining 
behavioural evidence for a hypothesised mechanism. We conclude thesre 
is historical evidence for the neural control mechanism investigated. 



1 Introduction 

At the previous European Conference on Artificial Life (EC AL-97) , Lund et al. 
presented their work on a robot model of the cricket species Gryllus himaculatus 
[1]. The model, devised by Barbara Webb [2], offers an hypothesised control 
mechanism to account for the ability of female crickets reliably to approach their 
mates using only cues available in the male calling song. This work, through 
replication of studies performed on the animals themselves, provides evidence 
that the hypothesised mechanism is sufficient to account for much of the var iety 
and detail of the female cricket’s phonotactic behaviour. However, the paper 
already alluded to, and other work with the model (e.g. [3]), provide what we 
sliall call behavioural evidence - that is, the experimental methods focus on 
demonstrating that the behaviour evoked by the controller, when implemented 
in a robot model, is statistically comparable to that observed by ethologists in 
experiments on the real animal. 

In this paper we consider the possibility of an alternative source of evidence 
for the sufficiency of the control mechanism, which we call historical evidence. 
Any mechanism in use in nature must not only account for the behaviour it 
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engenders, but must also have been generated by evolution. We therefore con- 
sider, in this work, the extent to which the hypothesised controller alluded to 
above can account for evolutionary adaptations in the cricket. If the controller 
can support similar evolutionary histories as are seen in the real animal, this 
provides additional support for its plausibility. (More strictly, if it cannot sup- 
port the kind of evolutionary histories seen in the animal species, it becomes less 
plausible that it is the natural mechanism even though it may account for the 
species’ behaviour very well.) 



1.1 A Methodology for Evolutionary Bio-Robotics 

The methodology employed here is an extension of the one developed by Webb 
for studying perceptual systems by building artificial ones [2,4,5]. The central 
idea of this methodology is to provide hypotheses and evidence for control mech- 
anisms generating sensorimotor behaviour observed in animals by using physical 
computational models (often robots). Webb [2] emphasises three principal re- 
quirements in her methodology: (1) model one particular biological sensorimotor 
system from a specific animal, (2) use physical (hardware) models, and (3) use 
established evaluation methods. These requirements stem from validity consid- 
erations: the model developed should be ecologically valid, hence the focus on a 
specific animal; physical models are used to finesse the difficulties of accurately 
simulating the (usually) complex physics of the sensorimotor interactions being 
studied; the evaluation and analysis of the physical model should be comparable 
to that of the animal, to allow generalisation of conclusions from the former to 
the latter. Now, from such comparisons, behavioural evidence for a hypothesised 
mechanism could be obtained. 

We can extend this methodology to provide historical evidence by considering 
the evolutionary context of the animal system under study: if the hypothesised 
mechanisms modeled in the robot are those found in the target animal, we should 
be able to simulate evolutionary change in the real animal with the model. For 
this to work, we add two further items to Webb’s list of methodological require- 
ments: (4) find an evolutionary scenario, in the biological literature, which is 
plausible for (or preferably observed in) the target animal, and (5) find parame- 
ters in the hypothesised mechanism that have a genetic disposition and that are 
relevant for the chosen scenario. 

Again, the primary motivation for these requirements is to permit valid com- 
parative inferences to be drawn. Without a specific evolutionary scenario match- 
ing the specific target animal, we run the risk of simulating something with no 
biological counterpart. The fifth requirement is more subtle, and is a consequence 
of the fact that we typically cannot simulate genetics as it happens in the target 
animal, i.e. the details of the encoding of the mechanism in genome and ontogeny 
are not known. To finesse this difficulty, we a.ssume that evolutionary change of 
mechanism parameters known to be determined genetically in the animal is an 
adequate model of the unknown processes that take place during the animal’s 
reproduction. The necessary evolutionary change is supplied by, for instance, a 
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genetic algorithm operating on the chosen parameters of the mechanism being 
studied. 

f\om the foregoing discussion it should be clear that finding historical ev- 
idence for a particular controller is in fact performing a series of behavioural 
experiments with parameters from generation to generation under the control of 
an evolutionary algorithm. The design of the fitness function for that algorithm 
can therefore be based on behavioural experiments of the kind alluded to above. 
It is less clear, however, what kinds of genetic operators are appropriate. In this 
paper we start with the standard genetic operators: reproduction, cross-over, 
and mutation. 



1.2 Survey of the Paper 

Thi,s paper reports a first attempt, using the methodology already outlined, to 
provide historical evidence for the phonotaxis mechanisms studied by Webb and 
her collaborators. To this end, we implemented a standard genetic algorithm us- 
ing, for the assessment of individuals, a fitness function measured via the robotic 
implementation of the phonotaxis mechanism described in [3]. The experiments 
therefore used on-line evolution, somewhat similar to the work of Floreano and 
Mondada [6,7], except that the goal here was not to evolve robot controllers 
with particular capabilities but to follow a diosen evolutionary scenario - the 
latter predetermined by the target biological system. 

The paper continues with a description of the specific evolutionary scenario 
we chose to model. Details of the experimental methods follow. Section 3 de- 
.scribes the results of the on-line evolutionary experiments and the analysis of 
the behaviour of the best evolved individuals. Finally, Sect. 4 discusses the re- 
sults and points out to future work. Also, in this section, we conclude by saying 
that our experiments did provide historical evidence on the hypothesised control 
mechanism investigated. 



2 On-line Evolution of Cricket Phonotaxis 

In this section we fill in the requirements, mentioned in the previous section, 
in order to provide historical evidence for the control mechanism for cricket 
phonotaxis investigated by Lund et al.. 



2,1 Biological Background 

Males of the species Gryllus bmaculatus attract females with their calling songs. 
Females respond by navigating towards the sound. For this they posess a phono- 
taxis mechanism that can both determine the directionality of the sound and its 
specificity (each species’ calling song has its own specific pattern). We will here 
investigate the second property: selectivity, used for distinguishing conspecific 
males from heterospecific ones. 
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An Evolutionary Scenario Cricket calling songs are grounded in the genes of 
the animal [8]. Therefore, cricket communication can be properly investigated 
from an evolutionary point of view. Otte [9] describes an evolutionary scenario 
that starts with two closely related species coming to live together in the same 
habitat. In addition, they happen to mate at the same time and have similar 
calling songs. Now, as females will not be able to distinguish between the two 
calling songs, interbreeding will occur. This situation is unstable in evolutionary 
terms and will result in either the divergence of the two species’ calling songs 
or the extinction of one of the species. In this work, we explored the former 
possibility. 

Evolvable Parameters The neural controller under study is based on findings 
in the neurophysiology of crickets. It consists of a two-layered neural network 
receiving input from the ears, and feeding its output into the motor system. 
Each layer consists of two leaky integrator neurons, one for each side of the body. 
The network accounts for both directionality sensing and selectivity. For a more 
detailed account we refer to the work of Lund et al. [3]. In the controller, several 
parameters represent genetically determined features of the neurons involved. 
Moreover, selectivity of phonotaxis in the females is tuned by the values of these 
parameters [3]. These parameters therefore provide the representation of the 
cricket genome for the model of cricket reproduction used in the experiments 
below. 

In our experiments we evolved seven parameters. For both layers of neurons, 
we take the decay rate (Dl, D2), the lower threshold of activation (LI, L2), 
and the duration of propagation after reaching the higher threshold (PI, P2). 
In addition, we evolved the higher threshold of activation in neurons of only 
the first layer (HI). The value of this parameter in the second layer was kept 
equal to the lower threshold. We did not evolve the parameters of the (physical) 
peripheral auditory model as the carrier frequency was the same for both songs 
in our experiments. 

2.2 Materials and Methods 

For the simulation of Otte’s evolutionary scenario we designed an experimental 
setup in which specific emphasis was put on fully automated experimentation. 
This was necessary, since the on-line evolution of just 2 generations took an 
entire night. 



Materials In our on-line experiments, the robot model [3] was able to move 
freely on a fenced table, size 240 x 240 cm. It was externally powered and con- 
nected to a controlling PC through a serial communication cable. A robust com- 
munication protocol [10] and a command interpreter enabled the controlling PC 
to drive the robot or send messages such as ‘start phonotaxis’ to the on-board 
software. Low-level processes, such as the phonotaxis behaviour, could thus be 
switched on and off by the PC. The implementation of the phonotaxis behaviour 
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(the neural controller of interest) in the robot was essentially the same as de- 
scribed by Lund et al. [3]. The physical robot was exactly the same. 

Two loudspeakers, representing males of the two cricket species, were placed 
on the table. The controlling PC could play different calling songs through them 
on demand. An overhead camera viewed the entire table and a real-time vi- 
sual tracking system [11] was employed to measure the consecutive table co- 
ordinates of the robot while performing phonotaxis. These data were used to 
analyse phonotaxis performance and also, between phonotaxis runs, as input to 
a vector-based homing mechanism used to drive the robot to particular starting 
positions for fitness assessment (see below). 



Methods Following Otte’s evolutionary scenario, we started the experiment 
with two populations of females showing equal preference to one calling song. 
In the course of the experiment, an evolutionary process would diverge the two 
populations’ preferences. On the other hand, unlike the scenario, we did not let 
the male calling song evolve freely. There are two reasons for us not to co-evolve 
calling song and female response. First, a simulated co-evolutionary process is 
not likely to follow the prescribed scenario, because of the self-organising prop- 
erties of a co-evolutionary system [12]. Secondly, we are only interested in the 
adaptivity of the female reponse, not of the male sound production. Therefore, 
we ‘evolved’ the sound pattern by hand and focussed on the changes in the 
phonotaxis mechanisms of the two populations. Due to limitations of time, the 
experiment consisted of a single run. 

We composed two fixed calling songs, differing in one temporal parameter: 
syllable period (the time between the onset of two consecutive bursts of sound). 
This corresponds to Otte’s observations in which the calling songs had diverged 
through this parameter. One population of males (population 0) would emit 
calling songs at 55 ms syllable period, versus 25 ms in the other population 
(population 1). The syllable period of the initial song was 40 ms. Each male 
population was modelled by a loudspeaker that emitted its song during the 
experiments. 

In the experiment, a genetic algorithm (GA) would generate (labelled) popu- 
lations of individuals which would be evaluated with the robot model. For doing 
this, the parameter set of each individual was downloaded into the robot consecu- 
tively. During fitness evaluation, the robot was exposed to the two sound sources 
simultaneously. Its task was to navigate towards the sound source matching its 
assigned label (population 0 or 1), i.e. its conspecific loudspeaker. 

The Genetic Algorithm The seven parameters mentioned in Sect. 2.1 were di- 
rectly encoded as an array of floating point numbers. A standard GA [13] was 
adapted for the evolution of several populations. Rank-based selection was used; 
mate choice was restricted to individuals from the same population. Interbreed- 
ing did not occur, because individuals that showed phonotactic preference to 
the heterospecific loudspeaker were not subject to the selection regime. Selected 
parents generated children by copying (90% probability) or two-point cross-over; 
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children were then mutated (40% probability, 10% mutation range). Each pop- 
ulation consisted of 30 individuals. This number was deliberately kept low for 
computational reasons. We used a generational replacement scheme, as this is 
the natural situation ibr crickets: every individual of the previous generation has 
died once the young hatch [8]. 

Fitness Function Every individual was assessed four times (trials) on its phono- 
tactic performance. The trials started at different locations, all equidistant to the 
conspecific loudspeaker. This is to ensure that the evolutionary process would 
not exploit factors other than the sound structure and its (constantly chang- 
ing) directionality. The initial direction the robot faced was always towards the 
heterospecific loudspeaker. Therefore, as the default movement of the robot was 
always straight ahead, failure to respond to any of the sound sources always 
resulted in poor performance, as the robot would navigate towards the wrong 
loudspeaker. Note that while the amplitude of the conspecific song was controlled 
for by this arrangement, that of the heterospecific song could vary substantially; 
a ‘good’ individual would have to demonstrate preference for its conspecific song 
against a range of amplitudes of distractor song. 

The tracker returned the movement track of the robot after each trial. From 
this, six measures were calculated: mean distance (dm), nearest distance (dn), 
and final distance (df) wdtli respect to both loudspeakers. Prom these measures, 
the fitness function estimated to which loudspeaker the robot was heading dur- 
ing the trial by comparing the mean and nearest distances to them. If the robot 
was heading for the heterospecific (wrong) loudspeaker, fitness value -1 was as- 
signed. Otherwise, a positive fitness value was assigned according to the following 
formula: 

l<itness~^ d, ^2 d, ’ 

with ds as starting distance. This formula converges to 1 when the robot’s move- 
ment track converges to a straight line to the conspecific loudspeaker. 

The fitness function returns a value which is the average over the four trials 
with the individual mechanism. If this value is negative, the individual is ex- 
cluded from reproduction (as it failed to localise a mate). With positive fitness, 
an individual is subjected to the selection regime of the GA. 

Evaluation of the Best Evolved Individuals To evaluate the performance of the 
best evolved individuals in the final generation we performed syllable rate ex- 
periments as described by Lund et al. [3]: we exposed the robot to calling songs 
of different syllable periods - ten times to each song, alternating between two 
different starting positions. These positions were mirrored in the central axis of 
propagation of the sound. Therefore, the robot’s abilities to turn both right and 
left were examined. The recorded tracks were transformed into polar coordinates 
and from these we calculated the mean heading angle towards the loudspeaker, 
averaged over the ten trials. This measure of performance for one individual 
for one type of calling song can be used in statistical comparison using the 
Mann- Whitney U-test - a non-pararnetric test for comparing sample means of 
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independent samples. The outcomes of this test indicate how significantly two 
sample means differ. A fuller description of our evaluation methods can be found 
in the work of Lund et ai, of Webb, and of Kortmann [3, 2, 14]. 

We compared the best evolved individuals of populations 0 and 1 (which we 
will call BO and B1 respectively), and an individual from generation 0 (GO). GO 
was the individual described in the work of Lund et al. [3]. This individual’s 
parameter set was hand-set, tuned to 40 ms syllable period. 



3 Results 

3.1 Evaluation of the Evolutionary Process 



In Fig. 1 the maximum, mean, and minimum fitness are depicted for each gen- 
eration in both population 0 (left) and population 1 (right). The mean fitness 
fails to climb to 1, which is due to the short simulation time (40 generations). 
Also, the behaviour depended strongly on the parameter values, relatively to 
each other. Therefore, cross-over, very often, turned a pair of good parameter 
sets into a pair of poor sets. Individuals that were not subject to cross-over, 
though, adapted easily to the new calling song, as can be seen from the quick 
convergence of the maximum fitness (2 and 6 generations in population 0 and 1 
respectively). 





Fig. 1. Development of the fitness of population 0 (left) and population 1 (right). 
Shown are the maximum, mean, and minimum fitness. Also shown is the regression 
line through the mean fitness data points. 



Population divergence was determined by fitting a regression line through the 
mean fitness data points (slope 0.007 ± 0.003 and 0.010 ± 0.004 in population 0 
and 1 respectively). 
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3.2 Evaluation of the Best Evolved Individuals 

From the results of syllable rate experiments with BO, Bl, and GO, three com- 
parisons were made. In the tables below, we use the following symbols to indicate 
how significantly the results differ; for p < 0,001, for p < 0.01, and 5° for 
p < 0.05, with p as significance level. NS means ‘no .significant difference’. 

Selectivity The performance of each individual with respect to different song 
types was analysed, revealing the song-selectivity of the individual. Table 1 shows 
that every individual tested is responsive to songs in a small range around its 
conspecific song. The responsiveness changes gradually with more remote syllable 
rates. 



Table 1. Comparison of phonotactic performance of each individual towards conspe- 
cific song and other songs. See text for key to symbols, 
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Responsiveness Here, the performance of a pair of individuals with respect to 
their own calling songs was compared. The comparisons show how well the indi- 
viduals are tuned to their conspecific calling songs. Table 2 shows that both BO 
and Bl respond very significantly better to their conspecific song, as compared 
to GO (significance level p < 0.001). The GA, apparently, tuned the parameter 
sets very well - remember GO was hand-set. Also there is a slight difference in 
responsiveness between BO and Bl (p < 0.05). 

Table 2. Comparison of the phonotactic performance of three individuals, relative to 
one another, with respect to their conspecific calling songs. 





GO (40 ms) 


BO (55 ms) 


Bl (25 ms) 


GO (40 ms) 




... 


5'^ 


BO (56 ms) 


5" 






Bl (25 ms) 


5^ 







Adaptation Now, the performance of either best evolved individual (BO, Bl) was 
compared with that of GO, both with respect to the calling song of the evolved 
individual. This comparison shows how well the evolved individuals adapted their 
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behaviour, relative to the initial population. Table 3 shows that B1 performs 
significantly better on its calling song (25 ms) than GO does (j> < 0.01). Even 
more significant is the difference between BO and GO to the 55 ms syllable rate 
song; p < 0.001 



Table 3. Comparison of the phonotactic performance before and after evolution with 
respect to the calling song of the evolved individual. 





|B0 (55 ms)|Bl (25 ms)| 


N 


S" 


s' 



4 Discussion and Conclusions 

Divergence of calling song caused the female response mechanism to adapt ac- 
cordingly. This can be seen most clearly in Table 3. Selectivity for temporal 
patterns was maintained in the evolved mechanisms (see Table 1). In addition, 
the evolutionary process tuned the parameters responsible for selectivity more 
sharply than hand-setting did (Table 2). 

While this is perhaps unsurprising from a GA viewpoint - GAs are of course 
well-known as optimisation techniques - the development of novel GA results is 
not the focus of this work. Rather, we suggest the results reflect an evolution- 
ary adaptation in female response that might be demonstrable in real crickets, 
considering the genetic disposition of the parameters involved. 

Repeating the experiments, to allow for contingency effects, will strengthen 
this suggestion and the model’s validity. Still, this is very costly and was, due to 
time limitations, therefore not performed. 

As was shown in Fig. 1, the best individuals adapted fairly quickly to a 
hand-set change in a calling song parameter. This was due a large mutation 
range (10%). Smaller mutations would have caused a more gradual increase of 
maximum fitness. Thus, although the present work did not permit co-evolution of 
female parameters and calling song structure, given this rapidity of evolutionary 
response we expect that a true co-evolutionary process between male calling 
song and female response would likely be successful in our experimental setup. 
Adaptibility to constantly changing environments is therefore a good topic for 
further research. In addition, the principles of male sound production can be 
.studied, as was mentioned in Sect. 2.2. 

Another topic for further study would be finding a better scheme for encoding 
the genome, for example a developmental scheme (e.g. see [15]) that is better 
able to handle the cross-over operator than our direct encoding scheme. 

Concluding, the experimental results provided historical evidence for the con- 
trol mechanism for real cricket phonotaxis proposed by Webb and collaborators 
[3j. We presented the first results of a new methodology that places hypothesised 
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mechanisms for animal behaviour in an evolutionary context using an embod- 
ied agent and computational methods. We hope that the work presented here 
will encourage more research aimed at not only elucidating mechanisms under- 
lying animal behaviour as they are currently found, but also their formation in 
evolutionary time. 
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Abstract. In 1961, Herrnstein [4| famously observed that many animals 
match the frequency of their response to different stimuli in proportion to 
the reinforcement obtained from each stimulus type. Since then, a great 
deal of research has attempted to elucidate the mechanisms underlying 
this ‘'matching law”, so far without a clear consensus emerging. Here, 
we take the view that “choice behaviour” is a product of agent, environ- 
ment, and observer, and that “mechanisms of choice” are therefore not 
to be located solely within the chooser, A simple model, employing the 
novel methodology of evolving choice behaviour in a multi-agent system, 
is u.sed to demonstrate that matching behaviour can occur (in stable 
environments) without any dedicated choice mechanism. 



1 Introduction 

All behaviour is choice. R.J. Herrnstein (in [14]) 

In ALife, behavioural choice has been largely synonymous with action selec- 
tion, which can be loosely defined as the problem of choosing what to do, at any 
given time, in order to further progress towards multiple, time- varying goals, [8]. 
Action selection is concerned with choice in the context of different behavioural 
options that relate to distinct goals (for example, feeding and drinking), let us 
call this a type-1 choice scenario. Here, we consider a different situation; where 
choice operates between different ways of satisfying the same goal - let us call 
this a type-2 choice scenario - an example would be foraging amongst two kinds 
of plant that differ in nutritive value. It is in such scenarios in biology that we 
can observe matching behaviour when animals allocate the frequency of their 
responses to different stimuli in proportion to the reinforcement obtained from 
each stimulus type. 

We may distinguish two questions: (1) how (and why) animals display match- 
ing at all, and (2) how animals are able to track changing environmental contin- 
gencies. The latter is clearly the more complex, implicating learning, and we shall 
be concentrating on the former; exploring some minimal conditions on the inter- 
nal mechanisms required to support matching behaviour in stable environments. 
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In Sections 2 & 3 we briefly revise how the biological sciences of ethology, 
behavioural ecology, and experimental psychology have approached both type-1 
and type-2 choice, and note that ALife, in following an ethological precedent, has 
(so far) concentrated almost uniquely on the former problem of action selection. 
The matching law of Herrnstein [4] is then introduced as an alternative choice 
paradigm and it is argued that theories in both domains suffer from what will 
be a theme throughout this paper; that behaviour and mechanism should not be 
confused. 

In pursuit of a more fruitful understanding of the mechanisms underlying 
choice. Section 4 describes a multi-agent model in which simple reactive agents 
are evolved in a type-2 choice environment. A hypothesis derived from Fretwell 
[2] is put to the test; that matching behaviour is optimal if there is sufficient 
competition for the resources. This model therefore introduces the evolution of 
multiple agents for choice behaviour. 

The results (in Section 5) support Fretwell’s intuition, and indicate that 
matching behaviour (in stable environments) can arise without there being any 
dedicated mechanism of choice, and, indeed, without any internalisation of dis- 
tinct behavioural options at all. Sections 6 and 7 discuss these results in terms 
of how ALife can contribute towards the natural science attempts to understand 
choice; in particular, how it can make the relationship between behaviour and 
mechanism the object of study, rather than the legacy of unwarranted assump- 
tions. 



2 Biological Approaches to Behavioural Choice 

2.1 Ethology 

Ethology, through the observation of animal behaviour in natural contexts, seeks 
to understand the nature of the mechanisms underlying behaviour and their evo- 
lutionary origins.^ Ethological accounts of behavioural choice (in type-1 choice 
scenarios) generally propose internal mechanisms that arbitrate between inter- 
nalised (and pre-existing) repertoires of behavioural options. The ALife approach 
to action selection has also relied on this foundation, and has investigated a broad 
class of arbitration devices, from hierarchies ([12]) to distributed networks ([7]). 

In previous work [10] this approach was criticised for committing the category 
error of assuming that externally observed behaviours must have internal mech- 
anistic correlates. A behaviour is the joint product of an agent, an environment, 
and an observer; thus the (agent-side) mechanisms underlying the generation of 
any behaviour shoidd not be assumed to be identical to the behaviour itself. A 
simple animat model was used in [10] to illustrate that effective action selection 
could occur without any internalisation of behaviour, and in which choice could 

^ Tinbergen, a founder of the discipline, actually specified /owr components of any com- 
plete ethological account of a behavioural pattern; causation, development, survival 
value, and evolution (from [9]). 




227 



be explained just as well in terms of perception as in terms of action. In this pa- 
per, a similar model illustrates that the same arguments apply in understanding 
the mechanistic basis of behavioural matching. 



2.2 Behavioural Ecology 

Behavioural ecology (BE) differs from ethology in that it is ostensibly not con- 
cerned with mechanism, but only with the adaptive rationale for observed pat- 
terns of behaviour.^ It is within this discipline, and within the complementary 
discipline of experimental psychology (EP), that the investigation of behavioural 
matching has acquired greatest momentum. 

The study of matching in BE has focussed on foraging. Krebs & Kacelnik 
[6] present a discussion of various foraging strategies, and the environmental 
conditions under which they may do well. A typical foraging problem is a type-2 
choice environment, with two types of source providing reinforcement of a single 
kind (for example, food), but at different rates or with different probabilities. 
Two particular foraging strategies are of interest here; the zero-one rule (see 
[11]) and the matching rule [4]. Given a type-2 environment, these rules make 
conflicting predictions about how foraging agents should behave with regard to 
the less profitable source (both rules agree that agents should maximize their 
intake of the most profitable). The former predicts that an agent will either 
always take the less profitable source upon encounter or never take it. The latter 
predicts that the agent will take the less profitable source at a rate that matches 
the difference in profitability between the two source types. 

In Section 3 we shall see how these conflicting predictions can be unified into a 
single adaptationist. framework, but for now let us note that although BE claims 
to be unconcerned with questions of mechanism, it has been argued that there 
is often an implicit tendency to credit the animal with complex computational 
and repre.sentational abilities (see [3]).^ So either BE tells us nothing at all 
about mechanism, or it may force us into making strong mechanistic assumptions 
without appropriate justification. 



2.3 Experimental Psychology 

It is to experimental psychology (EP) that we must turn in order to find more 
explicit investigations of the mechanisms underlying behavioural choice. In the 
case of matching, these investigations are primarily concerned with the more 
complex issue of how animals learn to track changing contingencies, rather than 
the simpler question of how (and why) they can match at all. It is perhaps 
because of this that the status of matching in EP remains unclear, with some 
believing that matching is the product of underlying learning rules (see, for 

^ Historically, BE derives from the split in ethology between those exclusively inter- 
ested in adaptation, and those who concentrated on mechanism (see [1]). 

® For those familiar with BE, this relates to the necessary assumption of a decision 
variable in optimal foraging theory. 
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example [5]), and others arguing that matching is itself the rule by which animals 
determine their responses, and not a (mere) description of the results (see [14]). 

The confusion surrounding the nature of matching may also stem from both 
the frequent lack of connection between the contrived environments of many 
EP experiments and ecologically plausible situations'^, and the common feature 
of mechanistic EP theories of the concept of response strength, which is essen- 
tially another manifestation of an internal behavioural correlate.^ EP therefore 
understates the environment both with respect to the behavioural problem (by 
encouraging arbitrarily constructed environments) and with respect to the mech- 
anistic solution (by relying on internal behavioural correlates). Indeed, given the 
co-dependence of behaviour on agent and environment, the former inevitably 
leads to the latter. 

Although the emphasis on learning in EP precludes direct comparisons being 
drawn with the present work, a bridge can be built by considering that the ques- 
tion of learning may productively follow an understanding of matching in stable 
environments, and that the problematic (and complex) mechanistic proposals 
prevalent in EP (see [14]) can be largely attributed to the confusion between 
behaviour and mechanism described above. 



3 The Matching Law and the Ideal Free Distribution 

In Krebs & Kacehiik [6] we find the following definition: “the matching law 
states that the animal allocates its behaviour in proportion to the rewards it has 
obtained from them.” More formally; 

log(^) = log{k) -f b{log{ — )) (1) 

CB rs 

where and Fg are the response frequencies to alternatives A and B, and 
f/i and rg are the attained rates of reinforcement from the two alternatives, with 
k and b as scaling parameters. The accepted fact that animals (and possibly hu- 
mans) often behave according to this rule in behavioural choice situations, both 
inside and outside the laboratory, is undoubtedly of importance and demands 
explanation. 

In 1972, Fretwell [2] introduced the concept of the Ideal Free Distribution 
(IFD). Given a distribution of resources of different qualities, and a population 
of (uniformly capable) foragers, the IFD describes the distribution of foragers 

A popular methodology in EP involves creating experimentally contrived abstrac- 
tions of foraging environments, in which different “choices” are directly presented 
to animals, with these choices being reinforced according to experimenter-defined 
“reinforcement schedules”, which can be quite complex (see [14] for a comprehensive 
review). Note that this methodology is not particular to ’behaviourism’. 

EP theories often focus on how the “response strengths” are modified, and on how 
they, in turn, determine the emitted response. But the “response strength” concept 
is a prime example of a problematic intervening variable, crucial to the theoretical 
coherence of EP, but empirically unsubstantiable from within it. 
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such that they all do equally well, regardless of the local resource quality, and 
such that no-one can profit by moving elsewhere. The central intuition is that 
the high quality resources will tend to become overcrowded, and Fretwell argued 
that the only foraging strategy that is evolutionarily stable is that foragers match 
the relative frequency of their choices to the relative qualities of their options, 
in other words, that they follow the matching law. Of course, in ecological con- 
texts without this element of competition, the optimal behaviour may well be 
to concentrate exclusively on the higher quality source, in other words to follow 
the zero-one rule. For my money, this explanation of matching behaviour is sat- 
isfying as far as it goes; a multi-forager, limited resource environment is indeed 
ecologically plausible. But no claims are made about the mechanisms involved 
in this choice behaviour. 

It is here that the current paper can contribute empirically, in playing out 
the above intuition. A model is described which employs artificial evolution to 
design internal mechanisms, for multiple agents, in a variety of choice scenarios. 
We are thus able to combine the mechanistic insight sought in EP, with the 
ecologically motivated adaptationist hypotheses of BE. 

4 The Multi- Agent Model 

The essence of the model is a type-2 choice environment with multiple foraging 
agents. A GA is used to evolve the parameters of a reactive feedfoward neural 
network with a fitness function requiring efficient foraging. The hypothesis under 
test is simple: as the number of foraging agents increases, the behaviour of each 
agent should approximate that predicted by the matching law. Following the 
evaluation of this adaptationist hypothesis, the evolved mechanisms are analysed 
and the relevance of the concept of a “mechanism of matching” is questioned. 



4.1 Agent and Environment 

The agent(s) exist in a spatially continuous world (but with discrete time steps) 
containing 4 of each of 2 types of source, A and B. Each source type can poten- 
tially fully replenish the internal battery of the agent (initial level 200), which 
otherwise diminishes at a steady rate (1 per time step). Source type A always 
replenishes the battery {r^ = 100%), but the replenishment probability rs can 
be explicitly set at the beginning of each experiment. If the battery reaches 0 
the agent dies, and trials terminate after a maximum of 800 time steps. 

After an agent has visited a source, the source disappears and reappears 
randomly in another part of the world. All sources appear in a limited area (200 
by 200 units) and the positions of all objects (sources and agents) are initialised 
with a minimum inter-object spacing of 40 units, a condition also adhered to 
when sources reappear following consumption. Each source is 16 units in radius 
(5 for each agent), and at full speed an agent covers 2.8 units per time step. 

The agents possess 5 sensors, 4 of which are tuned to the two source types (in 
2 left/right pairs), and one of which reflects the battery level. The source sensors 
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respond to the distance from the nearest source of each type to the agent, with 
each sensor ranging linearly from 100 (at the source) to 0 (> 200 units away). If 
the source is to the left of the agent, the relevant left sensor will respond with 
20% greater activation (and vice-versa if the object is to the right). 



4.2 Genotype and Phenotype 

The internal architecture of the agent comprises a simple feedforward reactive 
network, fully interconnected between layers, but with no internal recurrency 
within layers. There are 5 inputs feeding through to a 3 unit hidden layer, and 
then to a 2 unit output layer. The input units linearly scale the sensor values to 
range from 0 to 1, and all weights range from -1 to 1. Each neuron in the hidden 
and motor layers applies a sigmoid transfer function to the sum of its inputs 
(plus a threshold value), with the outputs ranging from 0 to 1. The outputs are 
scaled to range from -10 to 10 to set the wheel speeds. 

These weights and thresholds are specified by 26 real numbers on the 28 num- 
ber long genotype. The remaining two numbers specify something less orthodox; 
how well the agent is able to discriminate between the two source types. 

Each pair of sensors (5i), as well as having an associated source type ( i € 
{A,B}), also has an associated discriminability value o, (0 < a < 1). If both 
o’s are 0, then both pairs of sensors will behave identically, responding to the 
nearest source, regardless of type. As either a increases, the associated serisor.s 
(Si) are more likely to respond selectively to their particular source type, i. In 
detail, this is what happens. First, the distance (from the agent) to the nearest 
source of each type is calculated {Di} from which the distance to the nearest 
source of any type, (Dn), is easily derived. Second, for each sensor pair (Sj), if 
1Z < a (where TZ is a random number between 0 and 1), then Si responds to Di, 
otherwise to D^. Once the agent has perceived a source using this procedure, 
it will continue to perceive it in the same way until a new nearest source comes 
into range, at which point the procedure repeats. 

This additional element of discriminability is admittedly unusual for a simple 
feedforward network; the rationale for including it derived from a desire to allow 
the agents the ability to determine (to some extent) their own input, to evaluate 
the possibility that mechanisms underlying choice can be based on perception 
as well as action. 



4.3 GA and Fitness Function 

A distributed GA® was used to evolve population of genotypes in each of 4 
conditions in two different models; one model involving a single foraging agent 
(the S model), and another involving 3 agents (the M model). ^ In the S model, 
an incremental fitness function, summed over each time step as ^ {B being the 

® Population 64, mutation rate 0.04 per bit, crossover rate 0.5. Runs of 1000 genera- 
tions took about 10 hours on a single user Sun 166MHz SparcStation. 

' In the M model the agents could not directly perceive each other. 
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battery level), and averaged over 5 separate trials, rewarded agents that lived 
long and foraged effectively. In the M model, each genotype gave rise to 3 genetic 
clones, and the fitness of the genotype was assessed (as above) by following the 
performance of just one of these agents (selected randomly) under the rationale 
of avoiding direct reinforcement of behaviour beneficial to the group as a whole. 



5 Results 

Four conditions were first evolved and then tested in both the S and M models, 
I'B € {100%, 66%, 33%, 0%}, {ta = 100% in all conditions). Two were control 
conditions; rs = 100% and rg = 0% . In the foriuer, source types A and B are 
functionally identical, and we should expect foragers in both models to attend 
to each equally. In the latter, type B is always worth nothing, and so we would 
expect foragers in both S and M models to attend exclusively to type A sources. 
The experimental conditions were rs = 66% and rg = 33%. Here, we predict 
that the single S forager will eif/ier continue to attend to A and B equally, or will 
switch to exclusively attending to A. But in the M model, we predict that each 
forager will attend to B in proportion to the difference in profitability between 
A and B. 

Fit agents generally evolved after about 500 generations in each condition 
(in both models), but in each case the population was left until 1000 generations 
had been completed. The fittest agents from each condition (in both S and M 
models) were tested 1000 times each, with the average number of visits to A 
and B sources being recorded. The entire set of evolutions (and analyses) was 
repeated 10 times to obtain overall averages. Fig 1 illustrates that the above 
predictions were indeed borne out; in the S model the zero-one rule is followed, 
but in the M model matching behaviour is observed. 

A second analysis, using the same evolved agents, revealed very similar results 
(fig 2). This time, instead of testing in the same environment as that present 
during evolution, a parallel with EP was drawn by testing the fittest agent from 
each condition in the contrived environment of a forced-choice discrimination 
test. Here, each agent was placed equidistant from a single A source and a single 
B (and no other sources were present). The trial was stopped as soon as one 
or other of the sources had been visited, and again each agent was tested 1000 
times. Note that these tests always involved a single agent, even if evolution had 
occurred in a multi-agent environment. ® 

Sensitivity to parameter settings was investigated by re-evolving all 8 con- 
ditions with either 3 or 5 sources of each type in the environment (10 complete 
re-evolutions in each case). Analysis was performed as before, and fig 3 illus- 
trates, again, qualitatively similar results to fig 1. 

Turning briefly to the evolved mechanisms (of the original agents, not those 
re-evolved to check parameter sensitivity), the most immediate observation is 

® This test environment was meant to bear some similarity to the “skinner-box” ex- 
periments widespread in EP. 
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Fig. 1. Same Environment Testing Fig. 2. Forced Choice Testing 

Conditions are 1: rs = 100%, 2: tb = 66%, S: rg — 33%, 4- = 0%. These graphs 

shoiv the average (and standard deviation) rate of response to B{Fb) as a fraction of 
rate of response to A, (Fa) over 10 evolutionary runs in each condition. From each 
run, a single value was obtained by testing the fittest agent 1000 times in either (fig 1) 
the same environment as evolution, or (fig 2) in a forced-choice environment. The 
evolution of matching behaviour is observed in both M-model tests, and the evolution 
of zero-one behaviour is observed in both S-model tests. 



that there is a strong correlation between the difference in response to A and B, 
and the degree to which the agents can discriminate between the sources (fig 4). 



6 Discussion 

The above results clearly indicate that matching behaviour can evolve in multi- 
forager, stable, limited resource environments, and that agents with the same 
fundamental architecture evolve a zero-one foraging strategy if competition is 
absent. Four immediate observations can be made. First, the results are robust; 
neither testing in a forced-choice environment, nor re-evolving with different 
source densities altered their pattern. Second, it is evident that nothing more 
than a simple reactive neural network is required on the part of the agent, 
under the conditions of these models, for either matching or zero-one behaviour 
(although these agents are unable to track changing contingencies). Third, choice 
behaviour is apparently mediated as much by perception as by action (fig 4).® 
Fourtii, even in the conditions in which agents appear to be ignoring source type 
B (when rs — 0% in both the S and M models), removal of B sources afflicts 
their performance (mean fitness difference 2.11%, statistically significant using 
paired-sample t-test with t = 2.95, d/ = 18, p < 0.01). Here the B sources are 
clearly influencing behaviour, even though the agents are never “choosing” B. 

® An interesting side issue here is that adaptive behaviour often depends on the sen- 
sors presenting less than perfect discriminability. This runs counter to the common 
intuition that the more accurately sensors reflect the external world, the better. 






233 



S-MODEL(3) M-M0DELI3) S-MODEL(5) M-WODEL(5) 




Condition Condition Condition Condition 



Fig. 3. The distinction between matching 
and zero-one behaviour remains clear with 
altered source densities. The two leftmost 
graphs report results with 3 sources, and 
the two rightmost, 5 sources. The graphs 
are to be interpreted as in fig 1. All test- 
ing was in the same environment as evo- 
lution, and 1 0 evolutionary runs were per- 
formed in each condition. 
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Fig. 4. For each fittest agent in each con- 
dition (a total of 80 data points) the abil- 
ity of the agent to discriminate between A 
and B sources is plotted against the aver- 
age response pattern (over 1000 trials) to 
the sources. There is a clear correlation. 



These observations suggest that the idea of an explicit “choice mechanism” 
(or worse, a specific “mechanism of matching”) supervening on an internalised 
repertoire of behavioural options is inappropriate for these evolved agents. How- 
ever, this is not to say that the differences between the mechanisms are irrelevant 
to the differences between the behaviours; the forced-choice test results (fig 2) 
indicate that agents evolved in M (multi-agent) models still behave according 
to the matching law even when tested without conspecifics. 

If stable matching behaviour can arise without there being a dedicated 
“mechanism of matching”, it is natural to ask whether the importance of the 
matching “law” itself has been overstressed. One possibility is that the emphasis 
placed on matching derives from the fact that most (if not all) studies of matching 
(in biology) have taken place in variants of type-2 choice environments. However 
as Williams [14] argues; the way in which other choice alternatives exert their 
effects “is a vital theoretical issue that remains to be clarified.” Current work 
is addressing this issue by evolving choice behaviour in environments with both 
type-1 and type-2 characteristics (thereby also permitting the joint examination 
of the problems of matching and action selection.) Future work may also address 
the evolution of learning mechanisms that allow agents to track changing con- 
tingencies. It is perhaps here that parallels with EP might be more in evidence, 
and that the notion of a “mechanism of matching” may find greater application. 

It appears that the methodology espoused here provides a promising way 
of addressing the difficulties faced by both EP and BE in how they deal with 
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mechanism (see Section 2). By making the distinction between behaviour and 
mechanism fully explicit, and the object of study rather than assumption, we can 
avoid the pitfalls of either (1) searching for mechanistic correlates that underlie 
behaviours in arbitrary and contrived environments in EP, or (2) relying on 
the hidden mechanistic assumptions necessary for the construction of predictive 
models in BE. But there is an obvious pitfall for the ALife approach as well; 
the currency of minimal models and simple environments may bear very little 
relation to the complexities of their biological counterparts. The temptation to 
use ALife to attempt to displace biological knowledge must therefore be resisted. 
It is better that the insights from such simple modelling are used to engender 
shifts in conceptual stance of benefit to all intellectual parties. 

A word of caution is therefore necessary. I do not suggest that animals (or 
humans) choose in the way that these evolved agents choose. Nor do I ignore 
the fact that both humans and animals display matching behaviour in a much 
wider range of situations than has been investigated here, and are clearly able to 
rapidly modify their behaviour in response to continuously changing reinforce- 
ment contingencies. All 1 suggest is that the assumption that observed matching 
behaviour need be supported by a dedicated “mechanism of matching” super- 
vening on internalised behavioural correlates (or response strengths) should not 
be entirely trusted. 



7 Conclusions 

Marian Stamp Dawkins [1] has maintained for some time that a satisfying frame- 
work for understanding animal behaviour will require a new appreciation of Tin- 
bergen’s ideal that both adaptation and mechanism be considered jointly. As she 
says, the behavioural ecologists must start talking to the neurobiologists (or, in- 
deed, the experimental psychologists). I would like to conclude by suggesting 
that ALife can encourage this dialogue; in the present paper we have seen an ex- 
ploration of a BE phenomenon, in which an adaptationist hypothesis (Fretwell’s 
IFD) has been cashed out in mechanistic terms. We have seen that the search for 
a “mechanism of matching” is easily wrongfooted, given that choice (or, indeed, 
any behaviour) resides not solely inside the chooser, but in the joint activity 
of the chooser, its environment, and the observer. We have seen that matching 
behaviour in stable environments need not require anything more than a simple 
reactive mechanism. 

This work also benefits ALife itself, in three ways; by highlighting the poten- 
tial importance of perception in choice (a theme further developed in [10]), by 
demonstrating the evolution of choice behaviour in a multi-agent system, and 
not least by drawing attention to a rich body of behavioural choice phenomena 
to which the current paper serves as the briefest of introductions. 

There is of course a different tradition in ALife that is concerned with developing 
models of stringent biological plausibility, from which biological hypotheses can be 
directly assessed (see [13] for example.) 
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To return to Herrnstein’s aphorism that “all behaviour is choice”. Well, one 
could equally say that no behaviour is choice; behaviour is just observed ongoing 
agent-environment activity. We, as observers, demarcate and label portion.s of 
this .stream of activity, and sometimes call the boundaries choices. The foraging 
agents in this paper do not employ any mechanisms of choice; of course how 
animals may work remains an open question. 
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Abstract. The object of this work is to examine the early evolution of the 
nervous system in relation to adaptive behavior. The main questions are: how 
the nervous system came into being, how a system can be organized during 
evolution that is able to ensure the adaptive behavior of a being, what are the 
basic rules of construction that are sufficient to create a workable nervous 
system without specifying the details of the construction. The biological bases 
of the model are the phyla Cnidaria and Porifera because they stand at the 
beginning of the genesis of nervous organization. We found that if during the 
evolutionary process a kind of cell comes into being that is able to conduct 
electrical stimuli - even in a rudimentary way - than this kind of cell improves 
the behavioral performance by itself without containing any specific 
information of how to organize the construction of this system. 



1 Introduction 

Artificial Life offers a new device for modeling biological systems. Among other 
benefits we can use it for the better understanding of how the nervous system 
functions. One of the methods to achieve this is to examine a model of the nervous 
system together with the resulting behavior. This makes it possible to observe the 
influence of the change of the nervous system on the adaptivity of behavior. 

The aim of our work is to create a computer model of the earliest evolutionary 
stage of the nervous system. The main question we examine is: what are the simplest 
organizing principles that make possible the formation of a simple conducting and 
processing system that consists of only elements of the most primitive multicellular 
animals and that is able to influence the behavior in an adaptive way so as to ensure 
selectional benefits. The development of this system cannot require sophisticated 
genetic rules, because they are not present at this level of the evolution yet. So the 
base of the formation has to be only those simple rules, that follow from the 
functioning of the building elements. The biological bases of the model are the phyla 
Cnidaria and Porifera as they stand at the beginning of the genesis of nervous 
organization. 
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2 The animat and its environment 

2.1 Anatomy 

Our task now is to create a model-animal (animat) with the characteristics of the 
most primitive Cnidarians. The animat is a tube-like being corresponding to the 
gastmla-state, which is similar to the bodily construction of a Hydra. These animals 
have in fact the most primitive nervous system, so it is not arbitrary to imagine our 
animat as a Hydra-Yike being without tentacles (see Fig. 1.). This is suitable for our 
purpose, because these animals have no special locomotor organs which would 
demand an advanced neural apparatus for their operation [5], [10], [18]. The 
locomotor apparatus of the animat consists only of the muscles of the body-wall and 
their motoneurons. This ensures the possibility of simple movement of the body, for 
example curving and crumpling, like the movement of the body column of Hydra. In 
the recent model our animat is supposed to be sessile, because many species of 
Cnidaria and all of Porifera are also sessile. 

The movement of the animat is ensured by four longitudinal muscle-strips 
consisting of many muscular cells along the body. The working of a muscular cell 
causes the contraction of one section of the body which by itself ensures a very 
limited movement. Considerable movement is only possible if many muscular 
elements work in harmony with each other. 



2.2 „Histology” 

The body of the animat consists of three cell-types, all of which take part in the 
formation of behavior. These are: epithelial cells, muscular cells and nerve cells. 

2.2.1 Epithelial cells. The representation of this type of cell in our model is 
motivated by the fact that in animals with primitively developed nervous systems, 
mainly in Cnidarians, the role of epithelial cells is similar to that of nerve cells. These 
cells are able to receive stimuli and to conduct electric potential in a passive way and 
even to operate muscular cells [3], [16], [18]. In Porifera (which have no nervous 
system) these cells also play an important role in the conduction of stimuli [8], and 
many researchers derive neurons from these [9], [17]. These cells have a double role 
in our model, Just as reality: on the one hand they serve as a kind of skin, on the other 
hand they take part in receiving and conducting stimuli. 

2.2.2 Muscular cells. These cells form a homogenous population similarly to the 
epithelial cells, that is, all muscular cells have the same properties. One of these 
properties is the stimulus threshold, another is the number of neurons that can 
innervate the same muscle cell, and a third is the ability of receiving stimuli from 
epithelial cells. (It is well-known that in the case of Cnidarians more than one 
motoneuron can innervate the same muscular cell, and a motoneuron can take part in 
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the innervation of more than one muscular cell.) Features of epithelial cells and 
muscular cells are encoded in the genome. 

2.2.3 Nerve cells. Neurons can be receptors, intemeurons or motoneurons by their 
functioning. But at the lower levels of the evolution these functions have not detached 
from each other yet, so in Cnidariam - mainly in Hydra, which have the most 
primitive nervous system - the same neurons can carry out several functions [11]. 
[19]. 

Connections between the cells are not determined in advance, therefore synapses 
of a particular cell are not encoded in the genome (i.e.” indirect encoding” [4]). The 
formation of the synapses of a particular nerve cell is the result of ontogeny as 
determined by the number and the length of the processes of the given cell and the 
number of potentially synaptic partner cells within reach. 

Characteristics of passive conductivity of cells - a time constant (t) and a space 
constant (X) - are also built into the model. The time constant is the length of time 
during which the value of membrane potential decreases to its 1/e-th part. The space 
constant of the membrane is the distance where similarly the membrane potential 
decreases to its 1/e-th [7]. Both of them are derived from the cable-equation which 
describes the passive electric properties of biological membranes. 
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Fig. 1. The bodily construction of the animat Fig. 2. „Gene-mapping” of the animat 



2.3 Genetics 

The basic information about the construction and functioning of our animats are 
encoded in a „genome”. The „life” of an animat begins with a short „ontogeny”, 
when the animats start to develop on the basis of information derived from the 
genome. This phase is similar to that studied in the evolutionary modeling of neural 
networks [4], [14], but in the work we report here genotypes will not go through an 
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evolutionary process. The possibility for this is built into the model for future work, 
but in this paper we only make use of the ability of the system to create „phenotypes” 
(animats equipped with a simple body and a nervous system) on the basis of 
preencoded data. On the one hand this automates creating animats, on the other hand 
one of our very objects of study was the way a workable nervous system can come 
into being on the basis of some very general information concerning its construction. 
Utilizing the principle of indirect encoding means that not all the specific data for 
every cell are encoded, but only the most important general rules and data, on the 
basis of which the nerve cells and their connections can be created. This implies that 
different phenotypes belong to a given genotype [13], [15]. 

The model genome consists of four main parts (see Fig. 2.). General information 
about the size and construction of the body is encoded in the first part, which 
determines the morphology of the animat. If we lay out the tube-like body, the 
position of each cells can be described by x and y coordinates, where the maximum 
values specify the size of the body. 

In the second part of the genome the data for muscular cells, while in the third part 
the properties of epithelial cells are encoded. The homogenous cell populations of 
both kinds play an important role in the shaping of the frame of the body. At the 
earliest evolutionary state epithelial cells can take part in operating muscular cells, 
therefore data about this function must also exist in our model genome. 

The largest part of the genome is the fourth, which stores the data for nerve cells. 
These cells can form different local populations in the animat, so this part consists of 
as many sections as many populations there are. The value of the x and y coordinates 
and the synapses of a particular cell are not encoded, these develop during the 
ontogeny, when the phenotype comes into being during the decoding of the genotype. 
Unlike in the real animal in our model all data are encoded in a single 
„chromosome”. 




Fig. 3. The animat and its environment 



2.4 Environment 

The environment is quite simple. It consists of an imaginary box in which the 
animats exist (see Fig. 3.). We perform the tests in an isolated environment, which 
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means that there is only one individual in the box at a time. The stimulus is the 
appearance of a piece of food, which slowly sinks to the bottom of the box. This is a 
chemical stimulus for the animat and its strength decreases exponentially with the 
distance. If the animat is able to catch at least one piece of food the test is continued. 
The feeding is succesfiil if the animat not only touches the piece of food but also 
„eats” it, in other words it gets food particles into its coelenteron through the mouth 
(see Fig. 4.). The most successful individuals are those that catch the most pieces of 
food. 




Fig. 4. The movement of the animat 



3 Experiments 

The fully specified and operational model was part of the author’s Masters Thesis and 
is described therein [1]. A more complete report of the earliest results have been 
submitted elsewhere [2]. 

The investigation of the model consisted of several parts, during which the 
program was run with a number of different settings of the parameters. The purpose 
was to test the abilities of the animat and to establish relations - if they exist - between 
the behavior and the values of these parameters. During the tests we studied the effect 
of the time constant and space constant of the epithelial cells and the nerve cells, 
besides, the effect of the number of nerve cells and the maximum length of neural 
processes on the adaptivity of the behavior of the animat. 
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4 Results and discussions 

4.1 The effect of the time- and space constant of the epithelial cells on the 
behavior 

First we studied the behavior of animats that have only epithelial and muscular cells, 
but no nerve cells. These cells give a stable background for the functioning of nerve 
cells, therefore it is important to know the degree of responding ability of this system 
to the environmental stimuli. The time constant of the epithelial cells was studied in 
the range between 1 to 50 ms (the physiological value is about 1-20 ms) and the space 
constant between I to 10 mm (the physiological value is about 1-5 mm). 

Our results show that, qualitatively speaking, the excitability and the success of 
feeding are determined, in the first place, by the value of the space constant, and the 
time constant modifies them only slightly (see Fig. 5.). The success of feeding grows 
with the space constant significantly if its value is close to the physiological value. In 
summary, the epithelial cells alone are able to ensure a minimally adaptive level of 
the behavior for the animat, similarly to the behavior of a nerve-free Hydra [6]. 

On the basis of these results we can say that a homogenous, diffuse conducting 
network - which is probably similar to the ancestor of the nervous system - can be 
sufficient to control the basic reactions of a simple living being to the environmental 
stimuli, which stands in harmony with the results of physiological experiments. 
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Fig. 5. The number of hits (pieces of food that are caught and eaten) depending on the space 
constant of the epithelial cells, in the case of nerve-free network 



4.2 The effect of the number of nerve cells on the behavior 

In further experiments we studied the changes of the behavior of the animats when 
epitheliai-like protoneural cells appear in the primitive nerve-free conducting system 
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as created in the previous step. This corresponds to the evolutionary step during 
which the ancestors of nerve cells come into being from epithelial cells and begin to 
organize to a simple network. In this phase nerve cells are rather similar to epithelial 
cells, because they are able to connect only with their close neighbours. The 
conduction is decremental, so the network is a "protoneuronal system" similar to the 
present Hexactinellida [12]. The time- and space constants of the nerve cells are like 
those of epithelial cells. We chose low values where animats with nerve-free 
networks are no more able to feed successfully, therefore the changes of the behavior, 
if any, should be explained by the introduction of the nerve cells. 

We found that the success of behavior is determined by the space constant and the 
number of nerve cells together (see Fig. 6.). The number of hits (pieces of food 
caught and eaten) increases with the number of nerve cells and this tendency is more 
definite if we increase the value of the space constant as well. Comparing this with 
the results of the nerve-free conducting system we found that although animats with a 
nerve-free conducting system are not able to respond to environmental stimuli at 
these parameter values, the appearance and multiplication of nerve cells increases the 
success of feeding behavior. 




Fig. 6. The number of hits depending on the number of nerve cells in the case of different 
values of time (t) - and space (^) constant 

These results suggest that at the most primitive evolutionary state of the nervous 
system the ability to respond to environmental stimuli can increase by increasing the 
number of elements alone. This newly established fact may have a great evolutionary 
importance because this suggests that at the beginning of the nervous organization 
increasing the number of the cells may be the simplest way to increase effectiveness. 
At this level of the evolution there is no need for developing complex centres in the 
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nervous system yet, to increase its effectiveness, so there is no need to store the 
information of the organization in the genome either. 



4.3 The effect of the connections of nerve cells on the behavior 

Next we studied how the length of the processes of nerve cells - that determines 
the maximum distance of connections - influence the success of behavior. The length 
of the processes is given in proportion to the size of cells, during the simulation it was 
taken to be the 10-50-fold size of the nerve cells. In actual animals with the most 
simple nervous systems we do not find nerve cells with longer processes. 

We found that the success of behavior increases only a little with the increasing 
length of the processes and the degree of increase of behavioral success depends on 
the overall number of nerve cells (see Fig. 7.). What we see is that if the animat has 
less than 200 nerve cells, the length of processes (and with this the maximum distance 
of connections) has no effect whatsoever on the behavior. Considerable behavioral 
influence can be seen with a large number of neurons only, but the most important 
factor is not the length of processes, but the ability to develop processes at all. If the 
cells are able to create them, the behavior is influenced favourably even with short 
processes, and if the number of cells is between 200-500, the growth of processes has 
no further effect. But if the number of cells is more than 500, the adaptivity of 
behavior keeps on increasing with the maximum lengths of the processes. Therefore 
the success of the behavior depends first of all on the number of nerve cells, and the 
effect of the lengths of the processes only adds to this. The behavioral effects 
determined by the lengths of the processes (remember that these effects also depend 
on the number of nerve cells) can be explained by the fact that nerve cells have to 
reach a critical density to organize a workable network at all. In the case of few nerve 
cells their density is not sufficient to get close enough to each other and so they 
cannot form connections. 

Increasing the density of nerve cells, the behavior will be more effective (as seen 
on both Figs 6. and 7.). However at the first stage this growth is not due to the 
development of a well-functioning neural network, but to the increasing number of 
nerve cells that get close to the muscular cells, so they will work as motoneurons, 
presumably in such a way that muscular cells will be innervated by them directly (as 
indeed further results show this to be the case [1]), and this is already sufficient for a 
certain level of adaptive behavior. As already mentioned, it is common about real-life 
Cnidarians that nerve cells have more than one function, so they can receive stimuli 
from the environment (receptor function) and innervate a muscular cell (motoneuron 
function) at the same time [19]. 
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Fig. 7. The number of hits depending on the number of nerve cells and the maximum length of 
connections 



5 General discussion 

Summarizing the results: in a network of homogenous epithelial-like cells, considered 
as the starting point of nervous organization, the changes that adaptively influence the 
behavior are those that make conductivity more efficient. We found that such a 
change can cause the increase of the effectiveness of the behavior by itself, without 
any differentiation in the network or any development of special cell-types. There are 
numerous ways to increase the conductivity in the network. One of them is the 
increase of the space constant of cells in case of passive decremental conduction, but 
changing the biophysical parameters in living organisms is only possible within some 
boundaries. Evolution had found a solution to increase the effectiveness of 
forwarding electrical stimuli without increasing the space constant in an extreme way, 
and this solution was the action potential. Our results suggest that another way to 
increase the effectiveness of functioning is the increasing of the density of nerve cells. 

These simple ways of increasing the adaptivity of the behavior probably played an 
important role during the early evolution of the nervous system, because at this stage 
the genome of the animals could not contain detailed information yet to encode the 
construction and functioning of an effective nervous system. Therefore the 
organization of the system must have proceeded in an „ad hoc” fashion in the first 
stage, and it has a great importance, if this early and simple state of the system is 
already able to increase the chance of survival at all. 

The effectiveness of this simple system can increase significantly by changing 
merely quantitative parameters, therefore the factors that determine the adaptivity of 
the behavior can be reduced to quantitative attributes. All of these suggest that the 
way that leads to the ancestral forms of the nervous system could be covered by 
simple steps. 
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Abstract. Using evolutionMy simulations, we develop autonomous 
agents controlled by artificial neural networks (ANNs). In simple life-like 
tcisks of foraging and navigation, high performance levels me attained by 
agents equipped with fully-recurrent ANN controllers. Examining sev- 
eral experimental settings, differing in the sensory input available to the 
agents, we find a common structure of a “command neuron” switch- 
ing the dynamics of the network between radically different behavioural 
modes. In some of the models the command neuron reflects a map of the 
environment, acting as a “place cell”. In others it is based on a spon- 
taneously evolving short-term memory mechanism. The resemblance to 
known findings from neurobiology plmes Evolved ANNs cis an excellent 
candidate model for the study of structure and function relation in com- 
plex nervous systems. 



1 Introduction 

The study of Artificial Neural Networks (ANNs) relates to Neuroscience in two 
ways. On one hand, ANNs serve as models for certain functions of biological 
neural networks [1]. Like any model, ANNs abstract many key features in the 
phenomena they try to describe, and yet they give researchers the ability to gain 
better understanding of other, restricted aspects of natural neural mechanisms. 
On the other hand, ANNs use insights gained from investigating the nervous 
system to create flexible and powerful computational models that have already 
found their way to many useful applications [6]. 

In recent years a novel paradigm emerged in the study of ANNs. Thi s 
paradigm uses genetic algorithms [10] and evolutionary computation [4] to de- 
velop ANNs. Work in this field comprises of the development of “isolated” ANNs, 
evolving to maximize a certain target function on one hand [8, 5], and the de- 
velopment of “embedded” ANNs, serving as the control mechanism for an au- 
tonomous agent, on the other hand [3, 7, 11]. In the latter case the agents perform 
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certain behavioural tasks, and their performance level in these tasks serves as 
the basis for evolutionary selection. 

This new paradigm of Evolved ANNs (EANNs) is clearly very interesting 
from the applicative point of view, opening new horizons in the development 
of robotic control mechanisms. However, its relevance to our understanding of 
biological neural system has not yet gained wide recognition. 

In this paper we describe the development of EANN-controlled autonomous 
agents. We use unconstrained network architectures, and life-like behavioural 
tasks. Under these conditions, we show that networks maintaining steady acti- 
vation levels can evolve, and moreover - serve to control agents that perform at 
a remarkably high level, compared with algorithmic benchmarks. By analyzing 
the non-trivial network structures that evolve in these agents, we demonstrate 
the existence of neurons whose functional repertoire strongly resembles that of 
“command” neurons and “place” cells known from biological models. The emer- 
gence of “place cells” was already described by Floreano and Mondada [3], who 
study homing navigation of a real robot. We show that the evolution of such 
“place cells” in a task requiring simple navigation skills is a robust phenomenon, 
occurring under various sensory capabilities. 

Another important result presented here is the emergence of a memory mech- 
anism. A simple memory-based behaviour in a small, bilaterally symmetrical 
ANN with 10 neurons and about 20 synapses was recently described by Jakobi 
[7]. We describe the emergence of a memory mechanism in an unconstrained 
network with over 200 synapses. This memory mechanism culminates in a single 
neuron, which in turn serves as a command neuron modulating a complex switch 
in network dynamics between two radically different operation modes. 

These results testify to the biological relevance of the evolutionary paradigm 
in the research of ANNs. Combined with the fact that the evolving network 
structure is purely an emergent evolutionary phenomenon, and with the avail- 
ability of full information about network structure and dynamics, these findings 
lead us to believe that in EANNs will find their place as one of the leading 
models for studying the computational activity of biological nervous systems. 

2 The Model 

The Environment and Behavioural Task: The basic environment consists 
of a grid arena of size 30x30, surrounded by “walls”. In this arena two kinds of 
resources are scattered. “Poison” is randomly scattered all over the arena. Con- 
suming this resource decreases the fitness of an agent. “Food” is also randomly 
scattered, but only in a restricted “food-zone” area of size 10x10, at the south- 
west corner of the arena. The position of an agent is defined by the grid cell it 
is in, and by its orientation - north, south, east or west. In some of the models 
the boundaries of the food-zone were marked, enabling the agents to sense them, 
while in other models this marker was absent. The agents’ behavioural task is 
seemingly very simple - to eat as much of the food while avoiding the poison. 
The complexity of the task, as will become clearer later, stems from the limited 
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sensory information available to the agents. In order to successfully accomplish 
their task, they have to find a way to efficiently nardgate into the food zone, re- 
main there and consume as much food as possible. The life cycle of an agent (an 
epoch) lasts 150 time steps. 250 poison items and 30 food items are randomly 
scattered in the environment before the agent is introduced into it. The fitness 
of the agent is calculated as the total amount of food consumed minus the total 
amount of poison eaten, normalized to give a maximal value of 1. 

Network Structure and Dynamics: Each agent has an EANN “brain” con- 
sisting of 15 to 50 neurons (the number was fixed within a given simulation run). 
Out of these, Kin neurons (5 to 7) are dedicated sensory neurons. The values 
of these neurons are clamped to the sensory input. Four neurons are designated 
as output neurons, their output serving as commands to the motor system. The 
network has a fully recurrent connectivity, with the exception of the sensory 
neurons having no input from other neurons. All neurons with the exception of 
the sensory neurons are binary McCulloch-Pitts neurons, with zero threshold, 
and network updating is synchronous. 

Selection and Variation: A “chromosome” for an agent having a network 
of N neurons consisted of N{N - Ki„) real numbers specifying the synaptic 
weights for each of the synapses. Population size was fixed at 100. Reproduc- 
tion was sexual, with mating probability proportional to the fitness. Uniform 
point-crossover with probability 0.35 was applied to the parents’ chromosomes, 
after which point mutations were randomly applied to 2 percent of the locations 
in the genome. These mutations changed the pertaining synaptic weight by a 
random value between -0.6 and -)-0.6. Simulations lasted a pre-defined number 
of generations, ranging between 10000 and 30000. In the last generation, every 
agent was evaluated in 5000 epochs, averaging fitness to obtain its accurate fit- 
ness. We often used the best agents obtained in the last generation of a given 
simulation as a population seed for a ’continuation run’. This procedure usually 
yielded small performance increase in the first one or two iterations, The best 
agents were obtained using one or two such continuation runs. 

The Sensory System: Each agent was equipped with a basic sensor a “so- 
matosensor”. In addition, it could have a position sensor or a hunger sensor. 
The basic somatosensor consists of five probes, to each of which a sensory neu- 
ron is clamped. Four probes sense the grid cell the agent is located at, and the 
three grid cells ahead. These probes can sense differences between an empty cell, 
a cell containing a resource (either poison or food - with no distinction between 
the two), an arena boundary and food zone boundary when it is marked. Each 
of these cases was coded as an integer from 0 to 3. The fifth probe is a “smell” 
probe, returning -1 or -t-1 if the agent is currently in a grid cel! where there is 
poison or food respectively, and -1 or +1 randomly otherwise. Thus, the agent 
has to integrate the input from two sensors in order to identify the presence of 
food or poison. 

A position sensor consists of two receptors, sensitive to the distance from the 
northern and eastern walls of the environment respectively, and returning a signal 
inversely proportional to that distance. Again, two sensory neurons are clamped 
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Fig. 1. An outline of the arena (southwest corner) and the controlling network, for an 
agent equipped with a somato+hunger sensor. The agent is marked by a small arrow 
on the grid. Curved lines indicate where in the arena each of the sensory inputs comes 
from. Output neurons and inter-neurons are all fully connected to each other. The 
output neurons are marked by a grating pattern. 

to these values for the duration of a single time step. These two receptors are 
non-directional, thus giving no information about the orientation of the agent. 
The hunger sensor is a binary probe returning 1 if the agent has already eaten 
one or more food items during its life, and 0 otherwise. Note that the sensory 
system of the agents, with the exception of the position sensor, is purely local. 
Moreover, even in models where the agents were equipped with a position sensor, 
they had no sense of orientation, so navigation remained a challenging task. 
The Motor System; The motor sj'stem of an agent consists of four motors, 
receiving binary commands from four output neurons. The first motor induces 
forward movement when activated. Two motors control right and left turns, 
inducing a 90 degrees turn in the respective direction when only one of them is 
activated, and maintaining the orientation otherwise. The fourth motor controls 
eating, consuming whatever resource is available in the current location when 
activated. For eating to actually take place, however, there has to be no other 
movement (forward step and/or turn) in the same time step. This both enforces 
simple motor integration, and makes any attempt to eat a costly procedure. 

3 Behavioural Results and Performance Estimates 

In order to evaluate the performance of the evolved agents, we wrote a set 
of benchmark algorithms. We restricted ourselves to reactive algorithms, i.e. 
algorithms having no memory. In that restricted realm we can be quite confident 
that we wrote nearly the best possible algorithm, thus any agent obtaining a 
better score is clearly employing some kind of memory. The basic idea behind 
all these algorithms is the same, and can be summarized as: “When reaching 
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a resource - eat it if and only if it’s a food item. Unless you ’believe’ you are 
inside the food zone, try to navigate into it, and ignore all resources on your 
sides. If you ’believe’ you are inside the food-zone, switch to an efficient foraging 
mode, and try not to leave the food-zone...”. This basic idea translates into 
different instructions, according to the available sensory input. For example, 
when equipped with a position sensor, but with no hunger sensor (and no sense 
of orientation), the activation levels of the two position receptors can act as a 
trigger for the change in behavioural mode. 

The details of the specific tasks depend on the type of sensor available to 
the agent, and whether the food-zone borders are marked or not. This defines 
6 different scenarios. Fig. 2 summarizes the performance levels achieved by the 
best memoryless algorithms we designed, and those attained by the best evolved 
agents, for each of these scenarios. 




Fig. 2. Comparison between memoryless algorithmic performance and that achieved by 
evolved agents. S = somatosensor, SP = somato-l-position sensor, SH = somato-fhunger 
sensor. M = marked food-zone, U = unmarked food-zone. 

As can be seen from the figure, the evolved agents do fairly well, compared 
to the algorithms, in all possible scenarios. The agents specifically excel where 
no memoryless algorithm does well. As shall be seen below, this is due to the 
development of memory under these scenarios. 

4 Function and Structure of Successful Agents 

We shall now focus on three cases where the evolved agents have developed 
particularly interesting behaviours. The three cases share the same behavioural 
task. They differ in the sensory input available to the agents. In the first case, the 
somatosensor was supplemented by a hunger sensor. In the second, the hunger 
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sensor was replaced by a position sensor, and in the third case, the agents had 
to do with the somatosensor alone. 

In all three cases, the successful strategy relied upon a switch between two 
behavioural modes - exploration and foraging. Due to the different sensory ca- 
pabilities in each of these cases, different sensory cues were used to trigger that 
switch. However, a common structure could be identified, whereby the mode 
switch was always mediated by a central “command neuron”. The processing 
behind the action of this command neuron was different in each case, but its 
overall function was nearly identical, as we shall see. 

Case 1: Somato -f Hunger Sensor, marked food-zone borders. Under 
this scenario, the successful agents’ behaviour remarkably resembles that of the 
best algorithm we wrote as a benchmark for that task. The exploration mode in 
this case consisted of moving in straight lines, ignoring resources in the sensory 
field which are not directly in front of the agent, and turning at walls. If food- 
zone border is reached before the first food item is encountered, the border is 
ignored, and forward movement is preserved. The encounter with the first food 
item triggers the mode switch. The agent starts turning to resources to its right 
or left in order to examine them, and consumes nearly all the food items it 
encounters. Reaching the border of the food-zone at that stage causes it to turn 

- just as it turns at walls. This behaviour is not perfect, though. Sometimes 
resources are ignored, and there are cases where food-zone borders are crossed 
after the first eating event. These imperfections account for the gap between the 
agents’ performance and that of the algorithm. 

In this experiment, the hunger sensor modulated the behaviour of the agent. 
When the neuron connected to the hunger sensor was clamped to a firing state 
(indicating that eating has already taken place), the agent employed a foraging 
strategy even outside the food-zone. When clamped to the quiescent state, food- 
zone borders were always crossed, and movement was mainly in straight lines. 
In this case the “hunger neuron” was especially suited to serve as a command 
neuron by its predefined connection to the hunger sensor. But as we shall show, 
neurons with a similar function can also evolve spontaneously. 

Case 2: Somato Position Sensor, unmamked food-zone borders. The 

basic behavioural modes in this case were similar to case 1, with the exception of 
turning at food-zone borders while foraging, since the borders were not marked. 
The switch between the two modes was based on the position in the environment 

- foraging inside the food-zone, and moving in straight lines when outside it. 
The agents employing this strategy often stepped out of the food-zone, but then 
switched to moving in straight lines, thus returning to it promptly. 

It should be noted that without some form of memory, it is hard to avoid leav- 
ing the food-zone from time to time, as knowledge about position only, without 
orientation cues, doesn’t suffice for that. 

Examining the networks of successful agents revealed one common feature. 
Certain inter-neurons had a position-sensitive response. Typically, these neurons 
would fire outside the food-zone, and remain quiescent inside the food-zone and 
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in its close vicinity. Fig. 3(2a) depicts the mean activity of such an inter-neuron, 
as a function of the location in the environment. Fig. 3(la) shows that the hunger 
neuron in the previous experimental scenario {case 1) has a similar location- 
dependence (with inverse coding). Shifting the food-zone to the southeast corner 
of the arena revealed the different mechanism behind these seemingly similar 
maps. The activity pattern of the hunger neuron in case 1 shifts according to 
the food-zone location (fig. 3(lb)), being based upon the first encounter with 
food. The activity pattern in the second case, based directly on the agent’s 
position, remains fixed (fig. 3(2b)), thus leading to near-zero performance. 




Pig. 3. Location-dependent activity maps of the respective command neurons. Darker 
means higher average activity. Somato-|-Hunger Sensor: (la) Usual condition, (lb) 
Shifted food-zone (2.3% performance decrease). Somato-)-Pos Sensor: (2a) Usual 
condition, (2b) Shifted food-zone (100% performance decrease). Somatosensor: (3a) 
Usual condition, (3b) Shifted food-zone (no performance decrease). 

Given the direct sensory information about the location in the environment, 
the mere existence of a “place cell” is not very surprising. Yet the function 
of these emerging neurons is very interesting. In the lack of a single cell whose 
sensory' input can serve as a cue for a behavioural switch, the naturally emerging 
“place cells” provide this function. Indeed, clamping these place cells to active 
or quiescent states produces behaviour^d patterns identical to the ones caused by 
applying the same procedure to the hunger neuron in the previous experiment. 

Case 3: Somato Sensor only, untneirked food-zone borders. This partic- 
ular case turned out to be especially interesting. Due to the purely local sensory 
information and the lack of any positional cue, it is hard to adopt a strategy that 
forages more efficiently inside the food-zone. The two basic strategies - always 
moving in straight lines or always examining every resource in the sensory field 
- both yield near-zero performance. Our best memoryless benchmark algorithm 
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improved on this by using the total number of resources in the sensory field 
^ a cue for switching between foraging and exploration behaviours, scoring an 
average of 0.16. 

The evolved agents achieved much better results, as evident from fig. 2. The 
observed behaviour of these successful agents is characterized by an elevated 
tendency to examine resources in their sensory field, that lasts some time after 
eating. There is a 74% increase in the tendency to turn to resources right after 
eating, compared to the behaviour a long time after eating ^ . 

This behaviour led us to believe that the successful agents have some sort 
of short-term memory, remembering the last eating episode for several time- 
steps. Such a memory mechanism could take the role of a command neuron 
in this scenario, switching to foraging behaviour when food is encountered and 
maintaining it for a short while (a reasonable strategy, given the concentration of 
food in a restricted area). Examining the “brains” of these agents, we managed 
to confirm the existence of such a memory device. Fig. 4(a) shows the average 
activity of such a memory command neuron, as a function of the time elapsed 
since the last eating episode. 




Fig. 4. Activity patterns of a memory neuron: (a) The average activity level, as a 
function of the time elapsed since last eating episode. The average is taken over all 
the steps in 1500 distinct epochs, (b) The distribution of quiescent period durations. 
The solid line corresponds to exponential distribution with A = 0.26 (shifted by two 
timesteps). (c) A raster plot of the activity in 3 different epochs. Vertical lines mark 
steps where the neuron fired. The small triangles above them mark the feeding events. 

As can be seen from fig. 4(a), the memory command-neuron undergoes a 
sharp inhibition immediately after eating. Its probability to fire then increases 
as time since the eating episode elapses. The emerging dynamics of the cell is 
sustained activity, interrupted by periods of inactivity triggered by the feeding 
events. Fig. 4(b) shows the distribution of post-eating quiescent period durations. 
Fig. 4(c) shows a raster plot of the activity of the same cell, where the small 
triangles indicate the times of feeding events. As evident from the fig., the normal 
state of the neuron is active, and it is inhibited only following feeding. After a 

* In the case of marked food-zone borders, the trigger from the memory-based com- 
mand neuron also determined whether to turn at food-zone borders or not. 
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“refractory period” that lasts a single time-step, the length of the quiescence 
period is geometrically distributed. That is, the probability to resume firing at 
any time-step after eating is constant, but once the neuron starts firing, it will 
retain the firing state till the next feeding. The average quiescence period after 
eating lasts 5.74 steps. Approximating by a continuous exponential distribution 
function (shifted by two timesteps), we found the constant of the process to be 
A = 0.26, which corresponds to a mean memory maintenance of 5.85 time-steps 
(3.85, after a lag of two timesteps). The stochastic nature of such a memory 
mechanism emerging in a binary neural network can in part be accounted for by 
the random distribution of resources in the environment, as well as the partly 
random input from the somatosensor. 

Fig. 3 (3a) shows the mean activity of the same “memory neuron”, as a func- 
tion of the location in the environment. The activity pattern is similar to that 
of the command neurons in the two previous cases. The location-dependent ac- 
tivity in this case is a byproduct of the memory mechanism - the neuron is less 
active (on the average) in the vicinity of food, because eating shunts its activity 
for several time-steps. This leads to a more graded “map”. Similar to the case 
of the hunger neuron, shifting the food-zone from its original location to the 
southeast corner (fig. 3(3b)) causes no malfunction, as the location-dependent 
activity shifts accordingly. 

The simple utilization of memory in this experiment induces a considerable 
increase in performance level. Although these agents do not reach the scores 
attainable in the presence of global positioning cues, they do remarkably well, 
surpassing by far every memoryless algorithm. In fact, in the regime where the 
food-zone borders are not marked, the most successful agents lag only slightly 
behind the best memoryless algorithm that operates given explicit knowledge of 
the hunger state via a hunger sensor. 



5 Summary 

We analyzed the control mechanisms evolving in autonomous agents performing 
a simple foraging task and governed by a recurrent ANN, without any prede- 
fined network architecture. In order to succeed in their behavioural task, the 
agents had to develop a mechanism for switching between two distinct types of 
behaviours, foraging and exploration. In all six experimental scenarios examined, 
the evolved agents managed to closely match the best memoryless algorithms 
for the task, and in a few of the cases surpassed them by far. This was achieved 
with completely unconstrained network architectures. 

We discussed in detail three experimental scenarios, differing in the sensory 
input that was available to the agents. In all three cases, a similar mechanism 
has evolved, whereby an inter-neuron acts as a “command neuron”, translating 
the sensory input into a binary command that modulates the dynamics of the 
whole network and switches between the two behaviours. In the case where the 
agents had the most limited sensory capabilities, a memory mechanism emerged, 
which became the basis for such a command neuron. 
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Cells in the rat hippocampus show complex location specificity [9]. The 
location-specific activation patterns presented here are not nearly as sophisti- 
cated, yet they do seem to share the same basic characteristics. The function 
of the command neurons in our simulated agents resembles the CPR command 
neuron found in the Aplysia, which modulates the feeding behaviour based on 
certain sensory stimuli [12]. This resemblance to known findings from neurobi- 
ology suggests that once better scalability is achieved, EANNs may provide an 
excellent vehicle to study the fundamental problem of structure and function 
relation in complex nervous systems. 

The results presented here were obtained using the crudest form of genetic 
encoding - direct specification of all the synaptic weights in the genome. This 
bears a direct limiting effect on the scalability and speed of the evolutionary 
process. With the application of more sophisticated genetic encoding schemes, 
such as grammatical or ontogenic encodings [2, 8] we may expect the evolution of 
larger recurrent EANNS, processing more complex sensory input to achieve more 
intelligent behaviours. The accessibility of such models to thorough analysis 
should make them an important tool in the tool-chest of modern neuroscientists. 
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Abstract. The relationship between evolution (genetic & developmental 
processes of an evolutionary system) and modularity (its support for modular 
structures) is explored. Modules are defined as structures with common origin; 
either evolutional or developmental. In the former case, processes supporting 
modularity operate on the phylogenetic level, in the latter, on the ontogenetic 
level. Three such processes are identified (duplication, divergence, co- 
vergence). The existence of these processes determine the system’s support for 
modularity. Modules are analysed in the particular context of artificial neural 
networks (ANNs), where they appear as subnetworks. Gruau’s cellular 
developmental encoding is used as an example, and an extension is proposed 
which better supports modularity. 



1 Introduction 

The issue of modularity in living systems appears under various contexts, addressed 
by different disciplines. In artificial life, the issue of modularity appears under all 
these contexts, stressing the need for an inter-disciplinary approach [1]. We mainly 
focus on the relation between evolution and modularity, more specifically, what 
aspect of an evolutionary system determines its support for modularity? Does a 
system support modularity, and if it does, to what extent? In particular, in the context 
of evolved artificial neural networks (ANNs), we answer this question for several 
versions of the cellular developmental encoding. 

Common-Origin Definition of Modules. In the general sense, module refers to a 
subsystem that can be defined in terms of its function and/or structure. The criteria of 
similar function and similar structure are essential, but not very precise. Instead, in 
our evolutionary framework we define modules as subsystems with common origin, 
tacitly assuming a correlation between this requirement and the former two. Common 
origin includes both common evolutionary origin (phylogenetic) and common 
developmental origin (ontogenetic). We stress that common origin should not be 
viewed as the only mechanism for the emergence of modules. For example, unrelated 
modules can acquire similar functions. 

Evolved ANNs provide a good study ground for modularity, because relations 
between modules (which are subnetworks), and their internal structure, are relatively 
easy to analyse. A better understanding of modularity can also contribute practical 
ideas for evolving ANNs, which are widely used as controllers. Modular ANNs have 
several advantages (tested or hypothesised): smaller number of components, easier re- 
adaptation from one task to another, more powerful generalisation, etc. [5, 6, 1]. 
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2 Genetics of Modularity 

Duplication and Divergence Create Modules. Modules descending from a 
common origin are created by some duplication event. Duplication alone can account 
only for identical modules: similar (but different) modules are a result of subsequent 
divergence. As two sibling modules differentiate, they can assume different 
functions. If two contradictory adaptive constraints had been active on the ancestor 
module, duplication and divergence can resolve this contradiction, ‘lifting’ evolution 
from a temporary dead end [1]. 

Two Pathways: Phylogeny and Ontogeny. Duplication and divergence 
processes can occur on different timescales. We identify two cases: one involving 
processes operating on the population level (phylogeny), and one on the organism 
level (ontogeny). In the first case, the phenotype is encoded by the genotype directly: 
every module is encoded by a corresponding gene. A gene duplication event (on the 
genotype level) can result in two copies of the gene and the encoded module. 
Subsequently, genetic operators can alter the two copies independently, causing their 
divergence. We refer to this case as phylogenetic module duplication. 

In the second case, the phenotype of the organism is the result of a developmental 
process. Modules are result of duplication processes occurring during the ontogenetic 
process. The developmental process is guided by the genotype, but indirectly, thus 
there is no one-to-one correspondence between genes and modules. For convenience, 
we term this case ontogenetic module duplication. Table 1 . lists three module-related 
mechanisms and their form in the two cases. 




Figure 1. Illustration of the relationship between phylogeny and ontogeny. The 
horizontal arrow indicates evolutionary change, the vertical one developmental time. 
Left: no ontogeny; the phenotypic tree (white) duplicates the genotypic tree (grey). 
Right: ontogeny (black) links phenotypes and genotypes. Note the similar shape of trees. 

In both cases the end result is modularization on the phenotype level, induced by 
changes occurring on the genetic level, guided by selection. Ontogeny is an 
additional step linking the genotypic and phenotypic levels. Fig. 1. depicts the 
relation between the phylogenetic and ontogenetic planes. In nature, ontogeny 
reflects phylogeny [2, 11]. 

A notable difference is that in the ontogenetic duplication case there can be genetic 
operators that affect both modules equally. This is not possible in the phylogenetic 
duplication case, because sibling modules are encoded by independent genes. Such 
parallel adaptation can be useful. A change somewhere in the development tree is 
analogous to a past change in the ancestor module. In conclusion, ontological module 
duplication has better potential support for modularity. 
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mechanism: 


duplication 


divergence 


co-vergence 


effect: 


creates new module 


changes one 
module 


changes sibling 
modules 


in phylogenetic 
duplication case: 


gene duplicating 
mutation 


ordinary mutation 


impossible 


in ontogenetic 
duplication case: 


mutation that results in an 
additional duplication 


may be supported by mutation (with 
developmental processes) 



Table 1. The three module-supporting mechanisms and their appearance. 

Gruau’s Cellular Developmental Encoding. As an example, we look at the 
encoding by Gruau [5, 6]. Does this encoding support modularity? The only 
candidate for a developmental duplication mechanism is the div instruction. Two 
daughter cells can develop into subnetworks, but there is nothing in the scheme that 
would make these subnetworks related. Therefore, duplication is not implemented, so 
the basic Gruau encoding does not support modularity. 







Figure 2. A sample network (a) and its cellular developmental tree (b). The network has 
the architecture suitable for solving the parity-4 problem. Alternative encodings: (c) 
extended with ADSN, (d) extended with push/pop. 

The simplest way to extend the encoding to support modules is with an identical 
division (idiv) with only one subtree, executed by both cells. Thus more copies of 
the same structure can form. Gruau’s ADSN extension [5], implements the same idea 
with more flexibility. In these extensions, the code tree is generalised to a directed 
acyclic graph. But these extensions are still limited, as they produce only identical 
modules. In conclusion, this ADSN-extended version supports modularity, but in a 
limited way (it implements division and co-vergence, but not divergence). 
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We propose another way to implement identical modules, and one to implement 
divergence. We extend cells to hold a small stack of pointers, and introduce two 
instructions: push saves its right subtree on the stack, and executes the left; pop 
returns to the address on the top of stack (if any). This scheme matches the ADSN 
scheme, but still maintains the code in one tree, see Fig. 2. For divergence, a 
conditional branching instruction can be created: cells maintain a queue of flags, 
which record information about last divisions. Later, when they execute the same 
portion of the code tree, a cond instruction can branch according to this value. 
Finally, Table 2. compares these variations of the cellular encoding, by looking at 
their ability to express specific scenarios. Only the last variant is able to express all 
proposed scenarios. 



scenario: 

After division, the two daughter cells ... 


encoding: basic Gruau, plus: 


- 


IDIV 


ADSN 


PUSH 


PUSH+COND 


... execute different code (no modules). 


yes 


yes 


yes 


yes 


yes 


... execute the same code (identical modules). 


- 


yes 


yes 


yes 


yes 


... execute some different code, then the same. 


- 


- 


yes 


yes 


yes 


... execute same, then different (divergence). 


- 


- 


- 


- 


yes 


Same portion of code appears in arbitrary places. 


- 


- 


yes 


yes 


yes 



Table 2. Comparison of different extensions of cellular developmental encoding. 

Weak Encodings. The outlined extensions of the Gruau encoding might be 
criticised as ad-hoc and difficult to evolve. What is difficult to grasp is the concept of 
being similar yet different. In the Gruau encoding, the fate of a cell is completely 
determined by its genetic code; this is a strong encoding. However, in weak 
encodings genes have no total control over the expression of phenotype. Several 
ANN encoding schemes has been developed that imitate biological ontogenetic 
processes in more detail [8], in which the development of a cell is influenced also by 
other cells and the environment, adding another layer to the genotype-phenotype 
mapping. Due to these higher-order interactions, divergence can occur even if active 
genes are identical! Weak encodings are beyond the scope of this discussion, but we 
can argue that weak encodings provide better support for the emergence of modules. 

3 Experimental Results 

We compared the performance of different encodings on a simple task (temporal 
version of identity task for words, requiring memory). We compared three encodings: 
a direct ANN encoding (no modules), a basic developmental with no modules, and an 
extended developmental with module support (using push/pop). Most solution were 
found using the direct encoding - attributed to the increasing overhead of the more 
complex encodings (success rates were 87%, 10% and 37% for the three cases). 
However, the modular encoding produced solutions with more stmcture, which were 
able to generalise much better (generalisation power: 5%, 77%, and 90%). Space 
does not permit us to present further details of the experiments. 
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4 Conclusions 

Emergence of modules (defined by their common origin) depends on the existence of 
a duplication process, and at least one kind of adaptation: divergent and/or parallel. In 
a system, the collection of genetic and developmental processes determine the 
existence of such processes, and thus the system’s support for modularity. We 
distinguish two routes for module formation: the phylogenetic and ontogenetic 
pathways. While phylogenetic module duplication has a good intrinsic support for 
divergence, it cannot support co-vergence. In contrast, ontogenetic module 
duplication can support both (but not necessarily). 

The encoding used as an illustration, Gruau’s cellular developmental encoding 
does not support modularity in its basic form, some of its extensions do, but to a 
limited degree. Some experimental evidence based on a task of a simple recurrent 
network shows that more sophisticated encodings do produce more modular 
structures, and these can indeed be more desirable. However, the more sophisticated 
an encoding is, the more time the evolutionary search takes, due to increasing 
overhead. These limitation provides an incentive to explore more complex encodings, 
which include interactions. Weak encodings can more naturally embed the above 
mentioned three mechanisms. In light of these, it should not be a surprise that in 
nature, organisms typically ‘use’ developmental encodings, and weak ones. 
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Abstract. In this paper we describe our attempt to create a nature-like 
simulation model of artificial creatures. The model includes physical simulation 
of creatures, their interaction with the environment, their neural network 
control, and both directed and open-ended evolution. We describe a complex, 
three-dimensional simulation system, where various fitness criteria can be 
selected for evolving species, and a spontaneous evolution can be run. The 
work is still being developed, and we hope to make it a realistic model capable 
of producing real-life phenomena through an open-ended evolution in a life-like 
world of stick creatures. 



1 Introduction 

Artificial life research attempts to study real-life, biological organisms by creating 
and analyzing virtual creatures. Existing artificial life experiments seem to fall into 
two categories: the first is based on elegant, perfect, but often simple models. These 
are usually used for theoretical studies or to test some biological hypotheses - like 
those concerning coevolution in a pursuer-evader game [2], or evolution of spider nets 
and eyes [1]. Those in the second category use relatively sophisticated models, but the 
evolutionary mechanisms are not so much scientifically and biologically inspired. 
Instead, they focus on realistic simulation, graphics [7], or entertainment [6]. By 
encompassing the advantages of both approaches in Pramsticks, we tried to fill the 
gap between advanced artificial life models with their consequences and advanced 
simulation tools with their realism. Here, the simulation model is reasonably 
biologically realistic, while the evolution model allows great possibilities for various 
experiments. 

Our model encompasses a virtual, three-dimensional world and creatures (with 
their “bodies” and “brains”) that are capable of interacting with themselves (locating, 
pushing, hurting, killing, eating, etc.) and the environment (walking, swimming, etc.). 
The environment can be a composed of any combination of flat land, hills, and water. 
Evolution may be directed by the predefined criteria. 
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Although there have been many experiments so far, no attempt was made at 
spontaneous evolution using environments and creatures as complex as those in 
Ftamsticks. We hope that our model is complex enough to allow the emergence of 
sophisticated, life-like dependencies and phenomena, and simple enough to be 
simulated on existing computer systems. 

The paper is organized as follows: section 2 describes the system architecture and 
models of evolution and simulation. In section 3 we focus on the evolutionary 
properties of our system, describing genotype representation, genetic operators like 
crossover and mutation, etc. In section 4 we briefly discuss the results of experiments 
performed so far, summarize the work and present our future goals. 



2 Simulation model 

2.1 System architecture 

Our aim is to design the Ftamsticks model so that it allows for an open-ended 
evolution (including natural selection) of stick creatures, controlled by neural 
networks, in a three-dimensional world [4]. In the real world and some artificial life 
simulators {Tierra, Avida), the rules of selection and reproduction emerge from the 
simulated creatures’ living conditions. In our simulator, these rules are already 
defined, as they are in other evolutionary models designed for optimization, such as 
genetic algorithms [3] and most of artificial life simulations [7]. Creatures in 
Ftamsticks are evolutionarily optimized according to some predefined criteria. 
However, it is possible to mimic an open-ended evolution with the “directed” model 
of evolution. This is the case when the chosen fitness criterion depends on the 
survival and reproduction abilities. Thus, an open-ended evolution can be simulated 
by using the life span selection criterion. The longer the creature lives, the better it 
reproduces in the environment, which is generally analogous to the real-world 
situation. 

The main module of the simulator is the evolution simulator, which is responsible 
for maintaining the set of currently existing genotypes. This module must also obtain 
their multi-criteria evaluation in order to perform selection. All the individuals need 
not be simulated simultaneously; only a fraction of them are being evaluated in the 
virtual world at a time. The artificial world is thus a reduced model of the whole 
ecosystem. Such an approach is quite universal and has also the following advantages: 

• a few individuals can be simulated much faster than a few thousand, so one can see 
the simulation in the real-time, study the behavior of creatures, and affect them, 

• the only information needed to save/load the state of the evolutionary process is the 
performance of each genotype. The state of the virtual world is not saved. 

In order to construct such an architecture (figure 1), two parameters are needed: a 
maximum number of genotypes, N, and a maximum number of individuals 
simultaneously simulated, n. Usually, n is significantly smaller than N. When the 
interaction between simulated creatures is not important, n may be set to 1. Larger 
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values of n mean that a larger part of the whole set of individuals is simulated, and 
more interactions between creatures may happen. 



Fig. 1. 

The system architecture. 
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2.2 Physical simulation 



The module used to evaluate individuals (genotypes) simulates the creatures and their 
environment. A three-dimensional simulator is used, in hope that a range of complex, 
various stimuli affecting organisms will be the origin of dynamic development. The 
first behaviors tested were the mechanisms of locomotion and orientation in an 
environment, so all the kinds of interaction between physical objects were considered: 
static and dynamic friction, damping, action and reaction forces, energy losses after 
deformations, gravitation, and uplift pressure - buoyancy (in water environment). 

The basic element of the creatures is a stick (figure 2) made of two particles 
flexibly Joined. Finite element method is used for simulation. Sticks can have various 
length, weight, strength, friction etc. Neurons (coimected in any way) and receptors 
can also be placed on sticks. 



Fig. 2. 

A simple framstick creature. 




equilibrium receptor (gyrt^cope) 



Muscles (bending and rotating) are placed on stick junctions. The signal that controls 
a muscle changes the relative position of the adjacent sticks. 
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2.3 Neural network 

Ftamsticks’ neurons are similar to those widely used in machine learning. 
Sophisticated and unnatural processing units (as in [7]) were not introduced; we 
proved that it is possible to construct complex modules (integrating, differentiating, 
summing, subtracting, and generators with different shapes) from simple neurons. 
Neuron properties can be additionally changed by three special parameters (all under 
control of evolution): force, inertia and sigmoid. These parameters modify the way 
neurons process signals. Details and sample neuronal runs can be found at [5], 

An important aspect of the neural network is its interaction with the virtual world. 
Neurons can control muscles {effectors), and can obtain information from receptors. 
Currently, there are three kinds of receptors', those for orientation in space 
(equilibrium sense, gyroscope), detection of energy/food (smell) and detection of 
physical contact (touch). 



3 Evolution 

The genotypes used in Ftamsticks are described textually, so they can be easily read 
and modified by a human. Stick phenotypic properties are represented locally, but 
propagate through creature’s structure with a decreasing power. The genotype 
describes precisely all the parts of the corresponding phenotype. Small changes in the 
genotype cause small changes in the resulting creature. Control elements (neurons, 
receptors) are associated with the element under their control (muscles, sticks). The 
current restriction is that only tree-like structures can be represented (no cyclical 
structures allowed). 

Both physical structure (body) and neural network (brain) are described in the 
same genotype. The “body” is made of sticks, which have some properties: biological 
(muscle strength, stamina, assimilation, ingestion, initial energy level), physical 
(length, weight, friction) and concerning joints (rotation, twist, curvedness). The 
“brain” is made of neurons. The neural network can have any topology and 
complexity. An important property is that neural connections are described relatively. 
This lets sub-nets survive the crossover operation; the whole set of neurons can be 
moved to another place in the genotype (and in the creature), possibly with limbs, and 
can be still operational. 

Two genetic operators were introduced: crossover and mutation. Mutation 
concerns many aspects of genotypic changes, each having adjustable relative 
probability. The crossover operator is a two-point one. A simple repair procedure is 
used, which can repair small errors and validate an invalid genotype. 

In nature, groups of similar individuals share the same ecological niche and 
constitute species. In Ftamsticks, similarity to other coexisting species lowers the 
given species’ fitness [3], This introduces a pressure to diversify populations of 
species. The second mechanism which supports speciation is the specific crossover 
operation: the corresponding parts of genotypes of similar species are exchanged. 
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4 Conclusions and future work 

The evolutionary experiments performed so far concerned mainly directed evolution, 
with fitness defined as speed (on the ground and in water). Many walking and 
swimming species evolved during these runs [5]. Usually, the first idea of “how to 
move” is a neuron connected recursively to itself, and controlling a bending muscle. 
Better methods of locomotion on the ground include chaotic pushing back, where the 
changing signals come from equilibrium sense or touch receptors. After further 
evolution the movement becomes more purposeful, and redundant parts are removed. 

During the experiments, we had to modify simulation rules and fix bugs several 
times. Evolution turned out to be a very good method of searching the space of 
solutions (organisms), and was capable of finding fit individuals, regardless of their 
sensibility and validity. The faults in the simulator were sooner or later discovered by 
evolution, used in simulated organisms and exploited to the highest possible extent. 

Currently, the genotypic representation seems to be the main limitation, because it 
does not allow for easy evolution of complex organism structures. Our future work 
will thus concern introducing a better representation: instead of coding the structure 
and neural network linearly, the genotype will describe a way of creating (growing) 
an organism. Such an approach will base more on the nature, and may make the 
search of the organism space even more effective. 

Our future work will also concern defining a better similarity function, 
improvements of simulation rules and their parameters, and open-ended simulations. 
More receptors and more complex criteria for directed evolution may be introduced. 

We hope that after further development of open-endedness in our three- 
dimensional simulation model, evolution will create organisms with more complex 
behaviors, and realistic, life-like phenomena will emerge. 
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Abstract. This paper describes the generation and selection of visual 
feature detectors. The feature detectors are randomly generated, and 
are built out of components, some having functionality inspired on ob- 
servations of animal visual pathways. The input for the feature detectors 
consists of non-synthetic images, while the selectionist pressure comes 
from the amount of information the feature detectors generate. The ex- 
perimental setup is described and some results are given. 



1 Introduction 

In the same sense that Darwinian processes are the driving force for evolving liv- 
ing organisms, we believe that Darwinian-like development is responsible for the 
functional adaptation of the brain to its environment and tasks. Very much like 
a micro-ecology of competing signal processing pathways, in which more often 
used pathways are strengthened and less used pathways tend to disappear. This 
process needs diversity, here provided by a random generator; furthermore, a se- 
lectionist mechanism is required, which selects from the raw material according 
to certain selectionist pressure. This selectionist pressure depends on two factors. 
First of all, there’s the sensory input to the processing pathways. Depending on 
the input, particular functionality may or may not develop. Secondly, the task 
influences the selection of functionality. If the success of the task relies heav- 
ily on certain processing pathways, then these pathways will be strengthened, 
while disuse of pathways will results in their disappearance. This paper focuses 
on evolving visual functionality: generating and selecting visual feature detec- 
tors using selectionist mechanisms. Each feature detector is built out of simple 
primitives performing basic visual processing, some of which are inspired on func- 
tionality observed in the visual pathways of animals. This approach differs from 
the statistical analysis of natural images [6], in which regularities in images are 
used to attain data reduction, or biologically inspired object recognition [4], in 
which a set of carefully engineered feature detectors are used. The input consists 
of images showing objects or faces. The output of the visual feature detectors is 
typically fed to a task, this can be a cognitive task, e.g. object discrimination, 
or it can be a sensori-motor task, e.g. visual taxis for an animat. However, for 
the experiments described here no task is used, for reasons explained later. 
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2 Experimental setup 

During an experiment a set of images is presented to a set of feature detectors. 
For each image r, every feature detector produces a scalar output e [0, 1]. If 
there are N feature detectors, a jV-dimensional vector = {oi, 02 ; • ■ • ,o^■} is 
produced. This vector is then used by the task to act accordingly to the presented 
visual stimulus. 



2.1 The selectionist architecture 

For the experiments a strongly typed genetic program is used, this incorporates a 
random generator and a selectionist architecture. We use three parameter types: 
scalar is a defined as a; € [0, 1]. point is an ordered pair of scalars, defining a 
relative image coordinate. And image refers to an image of fixed size. Table 1 
shows the terminals used. The random ephemeral constant is a random number 
of type scalar, which keeps its value during the existence of an individual. Table 
2 shows the non-terminals; there is one extra non-terminal, not mentioned in the 
table, which combines all feature detectors; it accepts N scalars (with N being 
the number of feature detectors) and has no output type. 

Some primitives are chosen to ease the image processing, while others are 
inspired by observations in mammalian visual processing pathways. The orien- 
tationResponse primitive is based on the orientation specifity of the visual 
cortex [7], while the spatialResponse and spatialFilter primitives are based 
on the observation that mammals perform spatial filtering at a retinal and cor- 
tical level [2]. No spectral sensitive primitives were used, since this would bias 
the discrimination of the visual stimuli too much. 



2.2 The task and the feedback it provides 

The selectionist pressure depends on the environment -in casu the visual input- 
and the task of the individual. In the experiments no particular task is used, 
but the fitness of the feature detectors is determined by the information content 
of their outputs. This does not commit the evolution of the feature detectors 
to a specific task, but gives a general measure for the usefulness of the feature 
detectors, and allows for faster experimental runs. 

The information content is a measure for the spread of the feature detector 



Terminal 


Type 


leftTop 

rightBottom 

zero 

one 

random ephemeral constant 


point 

point 

scalar 

scalar 

scalar 



Table 1. Terminals for the strongly typed GP. 
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Function 


Functionality 


Input Types 


Output type 


averageintensity 


Computes average intensity of an 
image region 


image , point , 
point 


scalar 


thresholdlmage 


Filters out all pixels not in range 
[a, ft] 


image, scalar, 
scalar 


image 


spatialFilter 


Filters the image using a Lapla- 
cian of Gaussian filter with size 
<T, as defined by (3] 


image, scalar 


image 


spatialResponse 


Gives the response of a Laplacian 
of Gaussian filter with size a 


image , scalar 


scalar 


orientationResponse 


Gives the response to edges in a 
particular direction oc 


image, scalar 


scalar 


combinelmage 


Makes an image by taking the av- 
erage of two input images 


iniage , image 


image 


cons 


Combines two scalar in an or- 
dered pair 


scaleir, scalar 


point 


amplify 


Amplifier, parami / paramr 2 


scalau-, scalar 


scalao: 


attenuate 


Attenuator, parami . param^ 


scalar, scalar 


scalar 



Table 2. Non-terminals for the strongly typed GP. The last three non-terminals do 
not perform image processing. 



outputs Oi in the iV-dimensionai space. The fitness of a set of N feature detectors 
is calculated by taking the entropy in the discretised iV-dimensional space. Each 
dimension is divided into M equal divisions, giving a total of P = partitions. 
Figure 1 illustrates this. If I is the number of images presented to a set of feature 
detectors, the information content is calculated as in eq. 1, x, is the number of 
outputs in partition i. 




for Xi > 0 



( 1 ) 



The maximal information content Cmax = f (~7 In y) = In / is reached when 
every partition contains one output or no output at all. The fitness of an indi- 
vidual, with a maximum of 1, is defined as: 



/ = ( 2 ) 

1 "b {^max c) 

If the outputs are well spread in the JV-dimensional space, then the fitness 
will be high. On the other hand, clustered outputs will result in a low fitness. 
Note that if there are more partitions, it will be easier to reach a high fitness. 
There should be at least I partitions (with I being the number of images), so 
every dimension should be divided in at least ceil(log^ J) divisions. Note that 
it is important to evaluate all the outputs of all feature detectors in one fitness 
measure, instead of calculating the fitnesses of every single feature detector and 
then just summing the fitness values. The latter would be exploited by the GP to 
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Fig. 1. Illustrating a partitioned output sp^lce, 
there are N = 3 feature detectors, and each di- 
mension is split into M = 3 divisions, giving a 
total of P = = 27 partitions. The dots repre- 

sent outputs; ideally every partition should con- 
tain one output. 




Fig. 2. Sample of some in- 
put images. 



reach high fitness but evolve all similar feature detectors, while the first method 
ensures that the feature detectors will all differ significantly. 



3 Results 

The results described here all use the same input; 20 monochromatic images 
of daily-used objects^. To evaluate a set of feature detectors, all images are 
presented to the set and the fitness is computed as in eq. 2. 

The genetic program is run for 500 generations, with a crossover rate of 0.8, 
mutation rate of 0.1 and reproduction rate of 0.1; note that the reproduction 
rate is actually higher due to failed crossover operations. Figure 3 shows the 
fitness of a run. 

Results (not reported here) show that different input images (showing objects 
or faces) influence the selection of primitives (for more results and a elaborate 
exposition of design choices, see [1]). For inputs showing objects the feature 
detectors rely on intensity to categorize the inputs, while inputs showing face 
shots tend to produce feature detectors which rely on orientation and spatial 
information. 

A second observations is that the depth of the feature detectors is quite low. 
A run in which no selectionist feedback was given, produced feature detectors 
with an average depth of 7.3, and on average 130 nodes. While the run shown in 
figure 3 on average produced feature detectors of depth 3.2 and size 25. This is 
due to the attenuating effect of large trees: more primitives -connected to each 
other in a serial way- tend to filter out information, until no information is left. 

' The images axe taken from the Columbia Object Image Library (COIL-20) at 
http : / / WWW . cs . Columbia . edu/CAVE/. 
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4 Conclusion 

The papers demonstrates how feature detectors to discriminate visual stimuli 
evolve. The outputs of the feature detectors are presented to a task, which 
provides selectionist pressure. Future experiments should add more low-level 
context-free primitives (such as a symmetry detector, lateral inhibitory feature 
detectors and corner /junction detectors). Further, the number of feature detec- 
tors should be made variable and the task adaptive. 
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Fig. 3. Result of run with individuals consisting of a set of 5 feature detectors. Each 
dimension has M = 5 divisions, thus creating P = = 5® partitions. The graph 

shows the mean fitness and the fitness of the best-of-generation individual. 
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Abstract. This paper describes an evolutionary way to design behav- 
iors of a mobile robot for recognizing environments. We have proposed 
an action-based approach (called AEM) for a mobile robot to recog- 
nize environments. In AEM, a behavior-based mobile robot acts in each 
environments and action sequences are obtained. The action sequences 
are transformed into vectors characterizing the environments, and the 
robot identifies the environments with the vectors. The design of suit- 
able behaviors for AEM is very difficult for human because the search 
space is huge and intuitive understanding is hard. Thus we develop the 
evolutionary design of such behaviors using genetic algorithm. 



1 Introduction 

The most studies to recognize environments have tried to build a precise ge- 
ometric map using a robot with high-sensitive and global sensors. However, 
just to recognize environments, such a strict map may be unnecessary. Thus 
we have tried to build a mobile robot which recognizes environments only with 
low-sensitive and local sensors, and proposed approach that a mobile robot can 
recognize the environment with action sequences. We call this approach AEM 
(Action-based Environment Modeling) [5]. In AEM, a mobile robot acts using 
given suitable behaviors like wall-following in environments. Then the action 
sequences executed in each environment are obtained, and transformed into en- 
vironment vectors. A robot identifies the environments by comparing them. 

Through the research on AEM, we recognized a significant problem: where 
the suitable behaviors come from?. An easy solution is that human designs the 
behaviors. However the task becomes quite difficult for a human designer as 
the variety of environments increases. In this paper, we propose the evolution- 
ary design method of such behaviors using GA (Genetic Algorithm) and make 
experiments for evaluation. 

In the similar approach to AEM, several studies have been done in robotics 
[2] and artificial life [4]. In most researches, wall-following has been used as 
suitable behaviors in [2] [4] [5]. Unfortunately the behaviors were described by 
human designers, and fixed. 
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Fig. 2. Khepera 




Fig. 1. Overview of AEM 



Fig. 3. Sensor positions 



2 Task: Action-based Environment Modeling 

In AEM [5], a mobile robot is designed in a behavior- based approach [1], The 
behavior means mapping from states to actions. An AEM procedure consists 
of two stages: a training phase and a test phase (Fig.l). In the training phase, 
training environments having a class are given to a robot. The class means a 
category in which the environment should be included. The mobile robot acts 
in the environments using given behaviors and obtains sequences of executed 
actions (called action sequence) for each of them. They are transformed into real- 
valued vectors (called environment vectors) using a chain coding-like method. 
The environment vectors are stored as instances, and the training phase finishes. 

In the test phase, a robot is placed in a test environment which is one of 
training environments. The robot tries to identify the test environment with one 
of training environments. The identification is done using 1-Nearest Neighbor 
method, i.e. the robot selects the most similar instance to the test environment, 
and considers that the class of the instance is that of the test environment. The 
similarity is evaluated with Euclidean distance between environment vectors. 

Since the suitable behaviors depend on environment structure which a robot 
should recognize, they have been described by human designers thus far. However 
the task is very difficult because of a huge search space. 



3 States, actions and environment vectors 

Using real mobile robots as individuals in GA is not practical because it is 
impossible to operate several tens of real robots over more than one hundred 
generations. Thus we use a simulator of Khepera (Fig. 2). As shown in Fig. 3, it 
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has two DC motors as actuators and eight Infra-Red proximity sensors which 
measure both distance to obstacles and light strength. 

We describe a state with the range of a sensed value. For reducing the search 
space of behaviors, we restrict states and actions. A sensor on Khepera returns 
10 bit values for distance and light strength. Thus we transform the distance 
into binary values 0 or 1. The “0” means an obstacle exists within 3cm from 
a robot, and “1” means it does not exist. Furthermore only three (front, left 
and right) of eight sensors are used. Next states for light strength are described 
using only 4 sensors (front, left, right and back). We describe a state using the 
sensor with the strongest light value and its binary values which mean a light 
is “near” or “far” . A state in which all of the sensors has almost same values is 
also considered. As a result, the number of states for light is nine, and the total 
number of states is 72 (= 2^x 9). We also describe four actions; Al: Go 5mm 
straight on, A2: Turn 30° left, AS: Turn 30° right, A4’- Turn 180° left. 

The generated action-sequence is transformed into an environment vector. 
Let an action-sequence and its environment vector be [ai, a^,- ■ an] (ui 6 {Al , 
A2, AS, A4}, ao = 0) and V = (vj, V 2 , ■■ ■, Vm) {m > n) respectively. The 
vector values of V are determined by four rules: If a* = Ai then Vi = Uj_i, 
If Ui = A2 then u* = Vi-i + 1, If Uj = AS then Vi = Uj_i - 1, If Oj = A4 
then Vi = -Vi-i - They change the vector value when the direction of movement 
changes in the similar way to chain coding which is a popular method in pattern 
recognition. 

4 Applying GA to acquire behaviors 

GA procedure and coding We use a simple GA procedure and parameters 
in the followings. We use the coding in which one of actions {Al , • • •, A 4 } is 
assigned to each state. 

- Population size: 50, Crossover operator: Uniform crossover. 

- Selection method: Elite strategy and tournament selection (the size = 2). 

- Crossover rate Pcross- 0.8, Mutation rate Pmut’ 0.05 



Defining a fitness function We introduce three conditions for suitable be- 
haviors to AEM: termination of actions, accuracy and efficiency of recognition. 
The fitness functions for each conditions are defined, and integrated. 



A robot has to stop when it returns to the neighborhood of a start point. 

m, . • i.- • 1 j. j iu (No. of E-trial) -I- (No. of H-trial) , 

The termination is evaluated with g = ^ 2x (Total No of trials) 

E-trial and H-trial means trials in which a robot escaped from the neighbor- 



hood of the start point and trials in which it succeeded in returning. Accu- 



racy of identifying environments is also important. It is evaluated with h = 
^ To°al No^^ftest h = 0 when g In AEM, the actions should 



be as small as possible for efficiency. Hence we introduce k = 1— nJs ^ — where 
Si is the size of an action sequence in an environment i, Smax is the given limited 
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Table 1. Experimental results in Exp-1 



Env. 


Train, env. 


GN 


MaxF 


(a) 


{emp, L} 


1.0 (0) 


2.80 (0.106) 


(b) 


(a) + L2 


2.6 (1.84) 


2.43 (0.131) 


(c) 


(b) -h iL 


2.8 (1.69) 


2.44 (0.025) 


(d) 


(c) + s-emp 


2.8 (1.75) 


2.44 (0.093) 


(e) 


10 env. 


2.1 (0.738) 


2.51 (0.024) 


(f) 


12 env. 


5.2 (3.08) 


2.48 (0.07) 



Table 2. Experimental results in Exp-2 



Env. 


Train, env. 


GN 


MaxF 


(g) 

(h) 

(i) 

(j) 


emp, 1-la 

(g) -1- 2-la 

(h) + 3-Ia 

(i) + 4-la 


1.6 (0.966) 
4.8 (2.78) 
9.3 (5.19) 
10.0 (5.94) 


2.68 (0.196) 
2.59 (0.131) 
2.59 (0.111) 
2.62 (0.075) 



size of an action sequence, and /c = 0 when h ^ 1. We finally integrate three 
fitness function into f = g + h + k having range [0, 3]. The utility of the fitness 
function is investigated through successful experiments. 

5 Experiments with simulation 

We implement a system using a Khepera simulator [3], and make experiments. In 
all experiments, we give each of training environments to a robot once. The robot 
acts in the environments, and the environment vectors transformed from the 
action sequences are stored as instances. Next each of the training environment 
is given to the robot as a test environment, and the robot identifies each of the 
test environments with one of the training environments. In all the experiments, 
we had 10 trials having different initial population, and investigated the averages 
and standard deviations of generation number in which GA stopped. 

Exp-1: Environments with different contours in shape First we made 
experiments Exp-1 using environments with different contours in shape. The 
experimental results are shown in Table 1. Four parts ((a) ~ (d) in Table 1) of five 
environments: {emp, L, L2, iL, s-emp} were given to a robot. Additionally 10 and 
12 different shape environments were used ((e) and (f) in Table 1). The “GN” is 
the generation number in which GA stopped, and “MaxF” means the maximum 
fitness value at GN. The numbers in GN and MaxF stand for averages, and the 
numbers in brackets are standard deviations. This format is common in all the 
experimental results. Fig. 4 indicates the action traces of the best individuals at 
GN in (d). In such simple environments, the suitable behaviors for AEM were 
obtained within few generations. The standard deviations was large in GN and 
small in MaxF, and this tendency was observed through all the experiments. 
Seeing from Fig. 4, different action sequences were obtained depending on the 
structure of the environments. 



Exp-2: Environments with different lights Next, by adding different lights 
to environments in number and position, we made five environments: {emp, 1-la, 
2-la, 3-la, 4-la}. Exp-2 is made by using parts of the environments. Light was so 
strong that a robot can detect the light direction in any place. The experimental 
results are shown in Table 2. Fig. 5 indicates the action trace of the best individual 
at GN in (j). In the figures, a black circle stands for a light. 
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(a) emp 



(b)L 



(c) L2 (d) iL (e) s-emp 



Fig. 4. Trace of actions in Exp-1 (Five environments) 




(a) emp (b) 1-la (c) 2-la (d) 3-la 



(e) 4- la 



Fig. 5. Trace of actions in Exp-2 



Though the GN increased more than ones in Exp-1, the suitable behaviors 
were obtained. Note that we cannot intuitively understand the behaviors in 
Fig. 5. This means that it is very hard for human to design such behaviors by 
hand-coding and this automatic design method is quite effective. 



6 Conclusion 

We proposed evolutionary design of suitable behaviors to AEM. GA was applied 
to search the behaviors, and the simulated mobile robots were used as individu- 
als. States and actions were described for coding chromosomes, and we carefully 
defined the fitness function. We made experiments using different environments 
in shape and lights, and found out our approach is promising to learn suitable 
behaviors for AEM. 
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Abstract. In this paper, the processes of exploration and of iixcremcn- 
tal learning in the robot navigation task are studied using the tlynamical 
systexns approach. A neural network model which performs the forward 
modeling, planning, consolidation learning and novelty rewarding is used 
for the robot experiments. Our experiments showed that the robot re- 
peated a few variation of travel patterns in the beginning of the explo- 
ration, and later the robot explored more diversely in the workspace by 
combining and mutating the previously experienced patterns. Our anal- 
ysis indicates that internal confusion due to immature learning plays the 
role of a catalyst in generating diverse action sequences. It is found that 
these diverse exploratory travels enable the robot to acquire the rational 
modeling of the environment in the end. 



1 Introduction 

One of the debates in behavior-based robotics is whether or not agents should 
possess higher-order cognitive functions such as internal modeling, planning and 
reasoning. Most researchers in behavior- based robotics have rejected the ’’repre- 
sentation and manipulation” framework since they consider that the representa- 
tion cannot be grounded and that the mental manipulation of the representation 
cannot be situated adequately in the behavioral context of the robot in the real 
world environment. This argument seems to be valid if the agent’s mental ar- 
chitecture employs the symbolist framework. One of the major difficulties in the 
symbolist framework is that the logical inference mechanism utilized in planning 
or reasoning assumes completely con.sistent model of the world. This presump- 
tion cannot be satisfied if the learning should be conducted dynamically as in 
animal and in human adaptation processes. It is, however, also true that the 
embodiment of higher-order cognitive functions is crucial if we attempt to re- 
construct an intelligence at the human level in robots, since even two year-old 
human infants are said to possess primitive capabilities of modeling and planning 
within their adopted environment. 

We consider that an alternative to the symbolist framework can be found in 
the dynamical systems approach [4, 1] in which the internal cognitive processes 
are considered to exist in tight coupling with the external environmental pro- 
cesses [1]. Our previous study in navigation learning demonstrated that a robot 
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using a recurrent neural net (R.NN) is able to learn the ’'grammatical” structure 
hidden in the environment, as embedded in attractor dynamics with a fractal 
structure, from the experiences of sensory-motor interactions [8]. The forward 
dynamics [3] of the R.NN generates a mental image of future behavior sequences 
driven by the acquired attractor dynamics. The crucial argument in that study 
is that the situatedness of the higher cognitive processes are explained on the 
basis of the entrainment of the internal dynamics by the environmental dynam- 
ics. However, a drawback of that study was that the learning was conducted in 
an off-line manner i.e. the navigation coidd be conducted only after complete 
learning of the environment. 

In the current paper, we study the development of the interactive processes 
between learning and acting in the robot’s exploration of its environment. By 
conducting real robot experiments, we focus on how the robot interacts with its 
environment and how it makes sense of the world by utilizing its limited expe- 
riences. Our experiment exhibits an interesting result: we find that the diverse 
exploratory behaviors are generated through taking advantage of the state of 
confusion in the internal modeling in the middle of the learning process. Our 
analysis, based on the dynamical systems scheme clarifies the underlying mech- 
anism. 

2 The Model 

In this section we introduce a neural net model which enables the system to per- 
form exploratory behavior, goal-directed planning and behavior-based learning. 
The neural net architecture employed has been built by combining pre-existing 
neural net schemes. In the learning process, both reinforcement learning and 
prediction learning are conducted [11], Using reinforcement learning, the action- 
policies for better rewarding are reinforced, through which the most preferred 
action in the current state is selected. In prediction learning, the forward model 
[3] is adapted to e.xtract the cau.sality between the action and the sensation. In 
goal-directed planning, the inverse dynamics scheme [11, 3] is applied to the for- 
ward model in order to generate possible action sequences. In this planning pro- 
cess, the action policy adapted using reinforcement learning provides heuristics 
for searching for the better rewarded acion sequences. In the current formulation, 
rewards are given to the system based on the novelty which the system experi- 
ences for each exploration action [10, 6], In other words, when the system cannot 
predict the next sensation in terms of the current action, the current action is 
rewarded. In addition, the prediction learning attempts to learn to predict how 
much prediction error it will make. By combining this novelty-rewarding scheme 
with the reinforcement learning and with the prediction learning schemes, the 
system tends to explore the workspace regions with which it is unfamiliar. As 
the novelty rewarding scheme continues to bring new experiences to the system, 
the system is forced to operate in a nonequilibrium state in which learning as 
well as acting cannot always be rationalized. The main purpose of this modeling 
is to investigate the possible interplay between e.xploration and learning when 
the system develops in a nonequilibrium dynamical manner. 
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2.1 The neural net architecture 

A RNN architecture is employed in our model as shown in Fig 1. The RNN 
receives the current sensory input .S(, the current reward signal rt, and the current 
action r,. The RNN then outputs the prediction of the next sensory input 
the reward signal f(+i, and its preference for the next action which is 

expected to obtain the maximum reward in the future. For the novelty rewarding, 
the current normalized prediction error for the sensory inputs is used to evaluate 
the current novelty reward. It is noted that the reward is generated internally 
and we observe that the RNN learns to predict it (see section 3.2). The RNN has 
context units C; in the input and output layers in order to account for the internal 
memory state (See Ref.[8] for more details of the role of context activation in 
navigation learning.) 



Next Sensory Next Next prefered 
input reward action 

t t t t I r 




Sensory Reward Action 
input 



Context 



Fig. 1. The RNN architecture. 



(A) Learning: The RNN learns to predict the next sensory inputs and the 
rewards corresponding to the current sensory inputs, the action selection and the 
internal state. This corresponds to the forward model learning. The preference 
for the next action is learned by a variant of the profit sharing method [2] in 
order to propagate the decayed reward signal backwards in time. This means 
that if the current action selection leads to an unpredictable experience, this 
action selection is reinforced. This corresponds to reinforcement learning. Both 
learning processes are executed in the RNN using the back-propagation through 
time (BPTT) algorithm. 

(B) Planning: The objective of planning is to find the action plan x * 

|(.ro, xi, ....rT-) which generates the path to maximize the future calmative re- 
wards. The action sequence is dynamically computed by using contributions 
both from the forward model part and from the action policy part. Inverse dy- 
namics [3] are applied to the forward model in order to obtain the update of 
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the action plan Ax* for maximizing the calmative reward expected in the future 
sequence. We consider the following energy function by taking the negative of 
the calmative reward from the current time step to the terminal step r: 

T 

Em{x*) = - ^ a'-fi+i (1) 

i=0 

where a is the decay coefficient of the reward. The back- propagation through 
time (BPTT) algorithm [5] is used to compute the update to the action sequence 
which minimizes the energy assumed in the model part. In addition to this, the 
action policy influences the planning dynamics in that the difference between the 
preferred action and the planned action at each step is minimized. The update 
to the action at each future step is obtained by taking the sum of both parts of 
the contributions and adding a Gaussian noise rj. The update to the action plan 
is therefore 

A.r.- = e • - ^i) + • '?] (2) 

The Gaussian noise term is employed to prevent the plan dynamics being cap- 
tured in a local minimum. The value of kn is changed in proportion to the value 
of Em . Therefore the plan search dynamics become stabilized when the energy is 
minimized; otherwise, it continues to be activated. Here, the reader is reminded 
that the contributions to the update from the forward model and from the action 
policy do not always agree with each other in the course of the exploration pro- 
cesses since the overall system dynamics are characterized by highly nonlinear 
and nonequilibrium dynamics. 

(C) Incremental learning by consolidation: The robot learns what it 
experienced incrementally after each travel is terminated by using the so-called 
consolidation learning scheme [9] which has been developed as inspired by the 
biological observation of the memory consolidation [7] during sleep in mammals. 
In our system, a new episodic sequence experienced in the current travel is stored 
in the temporal memory. In the consolidation process, the RNN generates the 
imaginary sensory action sequence by rehearsing from the long term memory pre- 
learned. This rehearsal can be performed by repeating ’’planning”, as described 
in the previous section, without actually moving - as in dreaming. Then, the 
RNN is re-trained using both the new episodic sequence stored in the temporal 
memory and the rehearsed sequences generated from the pre-learned memory 
simultaneously. This combination of rehearsal and learning allows the memory 
system to be re-organized without suffering from some catastrophic interference 
between the novel experiences and the pre-learned memory. 



3 Experiment 

3.1 Task setting 

A mobile robot as shown in Fig 2 (a) is used for the experiment. The robot is 
equipped with range sensors and a color vision camera on its head. Fig 2 (b) 
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Fig. 2. (a) The robot employed in the experiment and (b) the adopted workspace. 



shows the adopted workspace. The basic behavior of the robot is determined 
using a set of pre-programmed action modules for actions such as wall-following, 
wall-switching and colored-object-approach. The actions are switched between 
by the RNN using branching. Tw'o cases of branching are considered: (a) the 
robot, after turning a corner, determines whether it will continue to follotv the 
current wall on its left side or instead to leave the current w'all and to move 
forward diagonally at 45 degrees to the right until it encounters another wall; 
and (b) the robot, after finding a colored object, determines whether to continue 
the wall-following or to approach the colored object. In this setting, the action 
can be represented by one bit of ixiformation which represents wdiether or not 
to branch. The RNN architecture receives two types of sensory input at each 
branch point. One is the travel vector which represents what distance and from 
which direction the robot has traveled since the previous branch. These values are 
measured by taking the sum and the difference between the left and right wheels’s 
rotation angles. The other sensory input is the categorical output of the visual 
image obtained when the robot encounters a colored object. The robot plans 
its future action sequence dynamically while it travels and receives the sensory 
inputs at each branch encountered. The robot starts its e.xploration travel from 
a fixed home position and the exploration is terminated when the travel takes it 
outside a predefined boundary. (The home position and the boundary predefined 
in our experiment is shown in Fig 2 (b).) At the moment of termination, the 
RNN receives the termination sign in its sensory input and the robot is brought 
back to the home position manually. Following this, the consolidation process 
takes place in which the temporary stored sequence is learned using 10 rehearsal 
sequences. After the consolidation, exploration by the robot is resumed. 

3.2 Results 

The robot repeated the exploration travels 20 times in the experiment. This ex- 
periment was conducted three times under the same conditions. Fig 3 represents 
the average prediction error for each travel sequence in the three experimental 
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experiment-1 




0 S 10 18 

sequence 



A 



experifnent-2 






sequence 



experiment‘3 




Fig. 3. The history of the prediction error for the three experiment cases. 



ca.ses. For all three cases, on average that the prediction error gradually decrease.s 
as the exploration is proceeds. 

It is interesting to observe the rehearsing during the consolidation learn- 
ing since the contents of the rehearsing activities represent what the robot has 
learned so far. Fig 4 shows how the diversity of the rehearsed plans at each con- 
solidation learning process change as the exploration proceeds. The lower graphs 




10 

sequence 


20 


0 


10 

sequence 


20 


0 


10 

sequence 



Fig. 4. Changes in the diversity of the rehearsed plans during the three exploration 
experiments. 



in the figure shows ID of all rehearsed plans generated during each coitsolidation 
learning period; the upper graph represent the corresponding predicted rewards 
of the plans generated. {The ID is assigned for each plan generated by encoding 
the bit pattern of the branching sequence, a maximum of 10 time steps in length, 
into numbers from 0 to 512.) It is observed that the diversity of plans is increased 
and that the predicted reward is decreased as the exploration trial is continued. 
We observed that the rehearsed plans are generated not just by repeating the 
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sequences previously experienced but by combining previously experienced se- 
quences into new ones. Since the rehearsing directly affects the re-organization of 
the learned contents, the diversity generated in rehearsing leads to the diversity 
in actual travel. 

In the following, we examine how the diverse travel sequences are gener- 
ated in the course of exploration. Fig 5 shows all 20 trajectories of the robot's 
travel observed in one experimental case (experiment-1). In the initial period 










Fig. 5. The trajectories of the robot (exploration travel for one experimoutal case. The 
travel sequence niimbfrr is given. 



of the exploration, the robot tends to repeat the same branching sequences. As 
is evident in Fig 5, the same trajectory is repeated for the first two travel se- 
quences. For the third sequence, branching changes and a different trajectory is 
generated. This trajectory is repeated in the next two travel sequences. The tra- 
jectory in the .sixth travel sequence seems to be generated by combining the two 
travel sequences previously experienced. We summarize that the novelty reward- 
ing scheme causes the observed repetitions and variations in the travel. When 
the robot undergoes a previously unexperienced travel sequence, the branch- 
ing sequence experienced is reinforced strongly because of its tmpredictability. 
When the same trajectory is repeatedly generated through reinforcement, the 
sequence becomes predictable and is rewarded less. As a result, the probability 
of modifying the current travel is increased. 

An interesting question is how novel action sequences are generated in the 
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planning process. What we found is that novel branching sequences are origi- 
nated not merely by the noise term in the planning dynamics but also by the 
internal confusion caused by the incremental learning. This point is illustrated 
by considering an example seen in the 10th travel sequence. In this travel se- 
quence, the robot, starting from the home position, continued to follow the wall 
after passing cornerl, then it branched to another rvall after passing corner2. 
This branching at coriier2 is a novel exjrerience for the robot. We investigated 
how this branching decision was generated by examining the recorded planning 
process. Fig 6 .shows the actual planning processes which took place immediately 
before the branching was made at corner2. In Fig 6 (a) each column consisting of 



(a) 




time step ^ 



(b) 



"O 

TO « 

P 5 



1.0 

0.0 



time step 




Fig. 6. (a) The time history of plan generation at tin; corner2, (b) the predicted reward 
of the corresponding plan. 



w'hite and black squares represents a branching sequence plan at each time step 
of the planning process, where the black and white squares denote branching 
and uon-branchiiig, respectively. Fig 6 (b) indicates the predicted reward for the 
plan generated. .4t the beginning of the planning process, a plan of not branch- 
ing twuce is generated with a low predicted reward. This plan wull repeat the 5th 
travel sequence if actually realized. At the end of the planning process, plans 
are generated such that branching actions are planned to occur repeatedly after 
passing corner2 with an expectation of a higher reward, even though such action 
sequences have never been experienced. It is noted that this type of plan was 
not observed when the robot approached the same corner in its earlier travels. 
Further examination shorved that the lookahead prediction of the sensory se- 
quences after branching at corner2 and at cornerl are mostly the same. This can 
be interpreted as meaning that the robot hypothesized that branching at any 
corner would lead to better chances for encountering novel experiences because 
it applied the situation after branching at cornerl to consider the situation at 
corner2. (Indeed, the travel will continue as long as branching is selected at ap- 
proaching corners without terminating the travel by going out of the workspace 
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boundary.) We conclude that the novel action of branching at corner2 results 
from the expectation of a higher reward which is falsely anticipated by means 
of fake memory generated in the course of consolidating immature experience. 
This phenomenon of the novel action trial being generated by fake memory and 
the internal confusion was seen frequently in the middle of the learning process. 

Finally, we investigated how the internal modeling develops by examining the 
evolution of the RNN attractor. Fig 7 shows the attractor which appeared in the 
phase space of the RNN at different stages in experiment- 1. The phase plots were 
drawn by iteratively activating the RNN in the closed-loop mode with inputs 
comprising 4000 steps of random branching action sequences. The generated 
sequence of the context units activation are plotted in the two dimensional phase 
space. In Fig 7, cluster structures consisting of multiple segments are clearly seen 
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(a) 7th 


(b) 11th 


(c) 15th 


(d) 18th 



Fig. 7. The RNN attractor appeared at a certain stage of the learning process in 
experiment- 1. The learning stage is given at the base of each plot. 



in the later periods of the exploration travel. Our examination clarified that this 
set of cluster segments represents the global attractor. Further analysis indicated 
that in the phase plots in Fig 7 (c) and Fig 7 (d) each segment corresponds 
uniquely to each branching position in the workspace and also that the graph 
structures are topologically equivalent between that of the state transition in the 
phase space and that of branching of the robot trajectories in the environment. 
In this condition, it is said that the “dynamical closure” is generated in the 
attractor since an equivalence of the closed graph structure is generated in the 
phase space. However, such structures were barely seen in the phase plots in 
Fig 7 (a) and Fig 7 (b). While the learning process is ’’immature”, the shape 
of the attractor varies substantially after each learning and neural dynamics 
exhibits diverse trajectories in the phase space and the robot behaves as if it 
were confused. In the meanwhile, the attractor develops step by step as the 
diverse exploration repeated and finally the dynamical closure is organized in 
the internal neural dynamics. 
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4 Discussion and Conclusion 

In the experiments, it was shown that the robot learned incrementally about 
its workspace through exploration and that the robot was eventually successful 
in obtaining a rational model of the workspace. However, the emphasis in this 
study is on the observation of dynamical processes before the rational model 
is achieved. In the beginning, a few travel sequences are repeated and later 
some combinations of them are made. In the middle period, novel actions are 
frequently tried with a false expectation of the future consequences. The confu- 
sion due to the immaturity turns out to be beneficial .since it acts as a catalyst 
for generating the diverse behavior required to explore the environment. Such 
diverse behavior enables the robot to acquire the rational model later. 

Our experimental studies, however, are limited in a sense that (a) the robot 
is manually recovered when it goes out of the workspace boundary, (b) the 
environment is static. Our future study will addre.ss these problems. 
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Abstract. With a few exceptions, today’s mobile robots, however complex, are 
not truly autonomous. At some time, they all require humans to supply them 
with energy and/or information; most also require other forms of assistance. In 
complete contrast, even the simplest animals are totally self-sufficient. We 
describe a current project' which aims to construct autonomous robots with 
animal-like self-sufficiency both in terms of energy and information. The robots 
will live free on agricultural land, hunting and catching slugs, and fermenting 
the corpses to produce biogas, which will fuel the generator providing the 
robots with power. 



1 Introduction 

During the last two decades much research has been carried out into the design and 
control of so-called autonomous robots. However, most of these robots still require 
some intervention from humans in order to carry out their task(s). Forms of human 
intervention include supplying information and energy, physically assisting the robot, 
and modifying the environment to suit the robot(s). For robots to be truly autonomous 
they would have to be able to carry out their entire mission without human 
intervention. There are of course a few examples of robots which achieve a high 
degree of autonomy, in that they carry enough fuel for their mission or can use radiant 
energy from their environment, and can control themselves without human 
intervention. Examples include missiles, smart torpedoes, and some spacecraft. In 
addition some automated cleaning and materials handling AGVs use opportunity 
battery charging to achieve a degree of autonomy. But while we might be inclined to 
congratulate ourselves on our achievements in improving the autonomy of robots, we 
must also recognise that even the simplest animals exhibit a degree of self-sufficiency 
and independence which is immeasurably superior to that of the best of our robots. 

This project represents an attempt to design and construct a robot system, with 
energetic and computational autonomy comparable to an animal system (as [10] 
urges). In order to make the mission a real technical challenge, we decided that the 
system would have to obtain its energy in the same way as most animals - by finding 
and 'digesting' organic material. Such natural resources are found in particular types 
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of places, and are destroyed by being used; the process of foraging for food must 
therefore deal with the issue of where and when to look for food, when to revisit an 
area where food was previously found, when to abandon a site which is not producing 
food, and so on. Of course, the organic resources, or food, must be converted to a 
form of energy that the robot system can use. We propose to convert the organic 
material to electricity by first fermenting it to obtain bio-gas, and then using this bio- 
gas to power an internal combustion engine driving a generator. Using modern engine 
management techniques, it is possible to use bio-gas with as little as 25% methane, 
and to recover up to 45% of its energy. 

The organic energy source should be a mobile animal species, since the technical 
problems of predation are much more challenging than grazing on plants. However, 
the prey species should be reasonably plentiful and not require rapid pursuit, which 
would be difficult to achieve at a reasonable energy cost. Since there would be a 
certain energy cost associated with finding, catching, and consuming any creature, 
however large or small, the prey species should be reasonably large, to give a 
reasonable margin over energy expended. Finally, it should preferably be a pest 
species subject to aggressive control measures, so that the system could be perceived 
as doing something of actual use that would have to be done anyway. All of the above 
criteria are met by the slugs found on agricultural land, especially Deroceras 
reticulatum [8]. They are slow moving, plentiful, large, and destructive - UK farmers 
spend over £20m per annum on molluscicides and spreading them [1]. Slugs are also 
potentially more suitable for fermentation than some other possible target species: 
they do not have a hard shell or exoskeleton and have a high moisture content. 

The agricultural ground where slugs are pests usually takes the form of a well 
cultivated seedbed containing winter wheat or potatoes [2]. Such ground is soft - so 
soft that moving a heavy fermentation vessel over it would consume large amounts of 
energy. The fermentation vessel/gas engine/generator system will therefore be fixed, 
and the robots will deliver slugs to it, and collect power from it rather like some social 
insect colonies. Because the fermenter will require a certain amount of energy to 
cover its operating overhead, it seems extremely unlikely that a single robot would be 
able to gather enough energy to service both its own and the fermenter’s 
requirements; we will therefore require a multiple robot system. This confers other 
potential advantages: some search tasks are more efficiently performed by a number 
of communicating robots than by a single robot; perhaps more importantly, the 
multiple approach gives a potential for achieving reliability through redundancy. This 
paper describes the progress made to date in the design and construction of the robots, 
work on the fermenter is to begin shortly. 



2 The Robots 

The key features of the robots are; that they must be energy efficient, operate in 
unmodified agricultural fields, and be protected from the weather, slime, and mud. To 
minimise energy usage whilst foraging each robot will scan the ground and catch 
slugs using a sensor and gripper mounted on a long, light articulated arm - the energy 
required to move this arm is much less than that required to move the whole robot 
over rough ground. The optimal arm length is a function of the distribution of slugs 
and off the power required to operate arms of different lengths in relation to the power 
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required to move the robot. Our calculations showed that a 1 .5m long arm would be 
the most energy efficient. The other requirements of the arm are that it should be 
light, stiff, easily controllable, and capable of moving in all directions; it should also 
have a fairly simple construction, increasing its reliability, making it easier to 
manufacture and reducing its cost. A design consisting of two 0.75m tubular sections, 
with a hinged joint between them, was chosen (see Fig. 1). To allow the arm to rotate 
around the whole robot it is mounted on a turntable located in the centre of the robot’s 
chassis. The chassis is large enough to maintain stability in all directions when the 
arm is fully extended. To meet the requirements of lightness and stiffness the arm is 
constructed from aircraft grade carbon fibre tube. To keep the arm structure light, the 
motor and gearbox required to provide movement at the elbow Joint are mounted on 
the turntable; a lightweight toothed belt inside the arm transmits the drive to the 
elbow joint. Since the numbers of slugs on the surface peak in the early evening and 
just before dawn, the rate of gathering them during these periods must be as high as 
possible. To this end the arm motors and gearboxes were .selected so that the arm can 
move from fully retracted to fully extended or vice-versa in under 1 .5 seconds. Self 
locking worm gearboxes provide the required motor speed reduction, and allow the 
arm to be held in position without consuming any energy. 




Fig. 1. Prototype three fingered gripper with wiper blades and compliance gimbal 
(left), and the arm and gripper .system mounted on a turn table (right) 

The arm’s end-effector is a robust lightweight gripper that is able to pick up and 
release both wet and dry slugs, regardless of their size and orientation, and any 
irregularities in the substrate. The current design consists of three fingers at 120'* 
spacing, operated by a single miniature motor. As the fingers close, they meet 
underneath the slug so that it can be lifted; wiper blades ensure the slug’s release 
when the gripper is opened. Slugs are detected and targeted by a camera mounted in 
the centre of the gripper, away from slugs and mud. To allow for contour matching 
with the ground, and to ensure that the view from the camera is always perpendicular 
to the ground, regardless of the arm’s extension, the whole gripper assembly hangs 
freely on a gimbal. When scanning for slugs it is possible to lock the gripper 
assembly, thus stopping it swinging, by fully opening the gripper. Each of the three 
wiper blades has a plate on the end to allow for passive alignment with the contour of 
the ground, ensuring that all three blades move under the slug when the gripper 
closes. The gripper’s mechanism will be protected from the weather and mud by a 
flexible rubber cover. 

Locating the static fermentation station in a large muddy field, where wheel slip 
will be inevitable, will be achieved by using a combination of the Differential Global 
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Positioning Satellite (DGPS) system, and an active infrared localisation system [4], 
[5]. DGPS can also be used for mapping the locations of grazed areas, so that good 
patches can be found again, and over-grazing can be avoided. (This last point may not 
be a problem: a study allied to this project [3] found that removing all surface slugs 
from a location every few days does not, in the medium term, appear to reduce the 
number of available surface slugs. This is thought to be because there is a large 
reservoir of underground slugs which can rapidly replace those that are removed.) 



3 Sensing and the control strategy 

A system is required that can successfully detect slugs, under sparse vegetation, 
against a background of rough earth. This task could potentially be achieved by 
several different types of sensors; we have opted for a vision based system since it 
offers the best combination of size, weight, cost and effectiveness. We have opted for 
a monochrome CMOS image sensor that is lightweight, low power (<175mW), 
adequate resolution (164 by 124 pixels), sensitive (0.1 Lux), has a digital interface, 
and can produce up to 60 frames per second. This sensor also has built in adjustable 
automatic exposure control and is inexpensive. 

To avoid computationally intensive image processing, a method of simplifying the 
detection of slugs and of filtering out vegetation is required. Since slugs are active 
mainly from dusk to dawn some form of illumination is needed. This creates the 
possibility of using some combination of coloured light and filtering to increase the 
visibility of slugs and decrease the visibility of vegetation. We have found that this 
can be achieved by using red light from extreme brightness LEDs, and fixing a red 
filter in front of the image sensor. In the received images, under these lighting 
conditions, vegetation and soil appear dark, whilst the slug Deroceras reticulatum 
reflects red light and thus appears very bright. Fig. 2 shows such a slug (32mm long) 
together with some grass under white illumination. Fig. 3 is the same image, except 
for some movement of the slug, under red illumination - the grass now appears dark 
whilst the slug is bright. Fig. 4 shows how applying a simple threshold function; 
(c[Average Image Intensity] - k) where c, k are constants, to the red illuminated 
image of Fig. 3 pinpoints the slug. This threshold scheme does not detect slugs under 
15mm in length, thus filtering out smaller slugs that would be of little use to the robot. 
All that is necessary in the way of processing is the registration of large bright 
patches; clearly, further tests will need to be carried out into the overall robustness of 
this system, especially with respect to slugs that are partially obscured by vegetation. 




Fig. 2. Under white light Fig. 3. Under red light 



Fig. 4. After thresholding 
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As regards the overall eontrol strategy, the situation of a typical animal is radically 
different from that of a typical robot. At any moment, a robot usually has a single 
goal, and is rarely faced with a real choice of what to do, whereas animals and our 
robots, which are in the same free-living situation, will have several simultaneous 
goals. Slugs must be gathered; batteries must be recharged; it must not get lost; it 
must always have enough charge to be able to return to the refuelling point; it must 
maintain the functionality of its sensors and effectors; and so on. How can we 
approach the task of programming the robots in our system so that they always act to 
maximise expected survival time? Our strategy, as designers, must be to find a 
computationally feasible solution which gives adequate performance. We know that 
because of the inefficiency of the digestive process, our system will at best be on the 
borderline of survivability, and so our performance requirements may be even more 
severe than those which an animal living exclusively on slugs would experience. 

Unfortunately, we lack the detailed information which might allow us to arrive 
immediately at a specific and optimal solution. We are therefore forced into taking a 
more general approach. The chosen solution has been to adopt a model of motivation 
and action selection in animals and robots (the d,r,k model) which has been developed 
over several years [6], [7], [9]. It is suitable for application on robots because it has 
been developed by considering animals as if they were robots, and vice-versa. It 
provides a formalism within which we can represent the level of motivation which an 
animal or robot might possess in a given situation in respect of what a roboticist 
would think of as its various goals, and enables the rational selection of one of a 
potentially large number of actions as a function of a relatively small number of 
variables and parameters. While the problem of parameter estimation remains, it 
provides a clear and principled functional framework within which to operate. 
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Abstract: This paper proposes novel adaptation method for a behavior-based 
locomotion robot. Utilization of hierarchical behavior controller makes the 
controller designing process easier and shorter, because the designer can deal 
with bfehavior controllers for simple behaviors independently, and coordinate 
those behavior controllers in order for a robot to perform the objective behavior 
after finishing the design of the fundamental behavior controllers. Some 
problems are still remaining. One is how to adjust the behavior coordinator 
when the objective behavior or robot parameters are slightly changed. We 
propose the novel method to adjust the behavior coordinator against some 
changes. This method measures the effects of the fundamental behavior 
controllers to the total behavior, and changes the activation values for them in 
less trials. This proposed method is applied to the real brachiation robot control. 
This brachiation ro.bot has redundant mechanism to locoraote from branch to 
branch with 14 actuators like a long-armed ape. 



1 Introduction 

Recent years, many researches about a brachiation type locomotion robot are 
carried out. The brachiation mobile robot(BMR) is a mobile robot, which dynamically 
moves from branch to branch like a gibbon, namely long-armed ape, swinging its 
body like a pendulum[l][2]. Saito et al,[3]-[5] proposed the heuristic learning method 
for generating feasible trajectory for two-link brachiation robot. Fukuda et al,[6] 
propose the self-scaling reinforcement learning algorithm to generate feasible 
trajectory with robust property against some disturbances. These studies do not use a 
dymamics model of two-link brachiation robot directly. On the other hand, Nakanishi 
et al,[7] took another approach, using target dynamics, for control an underactuated 
systems. The two-link brachiation robot is underactuated system with two degrees of 
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freedom and one actuator. As a two-dimensional extended model, seven-link 
brachiation robot is studied by Hasegawa et al.[8]. This seven-link brachiation robot 
is given the redundancy to locomote so that it is able to take a motion like a real ape 
in plane. In that study, hierarchical behavior architecture is adopted to design the 
controller with multi-input and multi-output efficiently. The behavior controllers and 
its coordinators in the hierarchical structure are generated using reinforcement 
learning. The concept of hierarchical behavior controller is based on behavior-based 
control, which has an advantage of creating a higher-level behaviors from simpler 
behaviors in complex system. The learning process for acquiring the behavior 
controller and the behavior coordinator is hard to apply a real robot, because of taking 
many trials to tune them. Even if a small adjustment about the controller is needed 
when the objective task or robot parameters are slightly changed, the reinforcement 
learning needs many trials because it uses no structured information about the 
relations and the effects between the each behavior and total behavior. 

In this paper, we propose the novel adaptation method to adjust behavior 
coordinator against small changes of the objective task or a robot environment. This 
method measures the relations between the each behavior and the total behavior and 
then deternnine the direction of change of the activation values from the behavior 
coordinator based on the measurements. We apply this adaptation method to control 
problem of the Brachiator III. This robot, Brachiator III, has 13 links and 12 joints 
and be able to take a motions like a real ape in a three-dimensional space. The 
controller is designed based on behavior-based approach. We show the effectiveness 
of the proposed algorithm through the experiments using the Brachiator III. 



2 Outline of 13-link Brachiation Robot 

The motion of the conventional brachiation robot is limited in two- 
dimensional plain. These motions are far from the real ape behaviors. Therefore we 
made the brachiation robot with 13 degrees of freedom in order to realize dynamically 
dexterous behaviors. The sizes and mobile joint angles of this robot is designed by 
simulating a real long-armed ape. The weight is about one kilogram heavier than a 
real ape because of actuator problems. The designing and adjusting process of the 
controller is very hard, because the motions becomes much more complex than the 
motions in plane 

2.1 Mechanical Structure of Brachiator III 

Brachiator III consists of 13 links, 13 degrees of freedom, and 14 motors 
including two motors for two grippers. Each joint is driven by DC motor through the 
wire so that we could conform the weight distribution to a real ape adjusting the DC 
motor locations. The appearance of “Brachiator III” is shown in Fig. 1 . 




Fig. 1 Brachiator III 



2.2 Motion Control Based on Behavior-based Architecture 

The hierarchical behavior controller for brachiation robot is designed based on 
behavior-based architecture. The objective complex behavior is supposed to be 
decomposed into some simpler behaviors. Higher level behavior is performed by 
coordinating simpler behaviors that are obtained before. The architecture of the 
hierarchical behavior controller of the robot based on this concept is shown in Fig. 2. 
At first, the brachiation behavior is divided into a swing action which stores the 
sufficient energy prior to the transfer (preliminary swing mode), and a locomotion 
action which is actual transfer operation (locomotion mode). Then, these actions are 
decomposed into the fundamental behaviors; leg swing, body rotation 1, leg stretch, 
body rotation 2, body lift and arm reaching. 




Fig. 2 Hierarchical behavior controller for brachiation robot 
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2.3 Desired Trajectory From Behavior Controller 

The behavior controllers except the arm reaching behavior controller are 
feedforward controllers which output the original trajectories expressed by the cubic 
spline function. The desired trajectories for the connected actuators are generated by 
rescaled by the activation values from the behavior coordinator on higher layer. The 
formulas are, 

yd,(f) = r^(yt(f)-ht(f)) (1) 

yd{. Desired trajectory to actuator i 
r*: Activation value for behavior controller k 

yj t): Actuator trajectory from behavior controller k. 

b^{t) = h(0)(t * -t)/t * +b{t*)t It* (2) 

t*: Time when the behavior is finished 




Fig. 3 Desired trajectory to connected actuator 



2.4 Motion Measurement of Brachiation Robot Using Real-time Tracking 
System 

It is impossible to measure the location of the tip of the free arm and the location of 
the center of gravity of the robot, because the slip angle between the catching grip and 
the branch is not measurable directly using a potentiometer or rotary encoder. We use 
the real-time tracking system, “Quick MAG System III”, which measures the three- 
dimensional locations of the eight points at 60Hz sampling fi-equency, using two CCD 
cameras. 

The eight measuring positions shown in Fig. 4 are chosen to calculate the center of 
gravity of the robot based on the assumptions as follows, 

1. The elbow of the catching arm keeps straight. 

2. Both legs are controlled to behave the same motion. 

3. Two joints on the shoulder are adjoining and attached on the almost same 

position. 






Displacement of center of 
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: Measuring points 



Fig. 4 Measuring points for calculating the center of gravity 





Center of mass: X 

direction 

Center of mass: Y 
direction 

““Center of mass: Z 
direction 



Fig. 6 Transfer of center of gravity calculated from each measuring points 
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3 Adaption for Behavior-Based Controller 

The unsupervised learning methods, i.e genetic algorithms, evolutionary 
programming and reinforcement learning, need many trials to obtain the desired 
behavior. Therefore, it is effective to use those unsupervised learning methods in 
computer simulations. The real robot would not behave the desired performance using 
the controller obtained in the computer simulations, because of the model error used 
in the computer simulation and some disturbances. An adaptation method or a tuning 
method for compensating such a error or disturbance should be applied to improve its 
stability. Furthermore the adaptation method is required, when the parameters of the 
robot or objective task is slightly changed. The proposed adaptation method for the 
behavior coordination is explained in this section. 



3.1 Adaptation Problem for Behavior-based Controller 

The brachiation robot is controlled by the hierarchical behavior controller 
explained above. To adjust the total behavior against the small changes of robot 
parameters or desired behavior, we have three approaches; the fundamental behavior 
controller, the behavior coordinator or both of them should be adjusted. The easiest 
approach is to adjust behavior coordinator because of the small searching space. If the 
adjustment of the behavior coordinator is not enough to the desired behavior, both the 
behavior controller and behavior coordinator should be rearranged. The proposed 
method is concerning about adaptation of only the behavior coordinator. In order to 
adjust the behavior coordinator, we should evaluate the effect of each behavior 
controller to the whole behavior. Therefore, we propose approximately measuring 
method of the relations between the each behavior and the total behavior through 
several trials and update method of the activation values of the behavior coordinator. 



3.2 Adaptation Algorithm 

The behavior coordinator indicates the activation values which determine the 
amplitude of the desired trajectories from behavior controller. By adjusting the 
activation values, the total behavior could be improved to some extent. The relation 
between the activation values and the performed total behavior is strong nonlinear, 
however we assume their relations could be express as the multiplier of the degree of 
contributions and the activation values, eq. (3), only in neighborhood around the 
current state. Three parameters are selected as a representative index of the robot’s 
performance; the maximum angle of center of gravity, the distance R in y-z plain and 
x-directional distance between a free hand and a target branch shown in fig. 7. 
When the objective task or robot parameters are slightly changed, the robot will fail 
because the differences between the current performance and new target performance 
are generated. The differences are express by the eq. (7) from eq. (3). We explain the 
algorithm as follows. 
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1 . At first, measure the degree of contribution of each behavior controller to three 

index parameters, some value as a perturbation Ar around current activation 
value are added to each activation value and then make a trial per a 
perterbation. 

2. Using eqs. (8), (9) and (10), measure the degree of contribution of each 
behavior controller, 

3. Calculate the new activation value using eq. (11). 

4. Evaluate its performance by making trial with new activation values and 
explore the their neighborhoods area, R’ and R’ ’ . 

5. Update the degree of contribution of each behavior controller with new 
activation values using eq. (12). 

6. Go to step 3. 
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ii) if(s > 0) 

W(s + l) = F(s)-T(s) (12) 

where 

F(s) = (£(U E\s) E"{s)) (13) 

T(s) = {R(s) R'(s) R"(s)) (14) 

«'(■?) = (Ar(i),^, Ar{s)i^+a, Ar(x),, -a]T (15) 

R"is) = {Ar(s),^+a, Ar(s)^-a, Ar(s),J (16) 
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Fig. 7 Figure (a) shows the angle of center of gravity, and figure (b) shows the distance R in y- 
z plain and x-directional distance in between a free hand and a target branch 



4 Experiments 

We show the adaptation of the proposed method against a change of the 
objective task. Brachiator III with initial hierarchical behavior controller can catch the 
branch whose span is 90 centimeters. Then as a change of the task, the span to the 
next branch is extended to 101 centimeters. Brachiator III failed to catch ^e branch 
since an oscillation energy of the center of gravity is shortage and the body did not 
come up to a reachable region. 

We assume that the locomotion behavior coordinator is not sufficient and 
should be tuned. We therefore apply to the proposed adaptation algorithm to adjust 
the locomotion behavior coordinator, which has four fundamental behavior 
controllers and output four activation values to them; Leg stretch. Body rotation 2 , 
Body lift and Arm reaching. We set Ar of first step with +0.05, ±0.1 and ±0.4 as 
the perturbation, and set a with 0.05,^ with 0.5. Also we choose “arm reaching “, 
“body lifting” , “body rotation 2” as behavior to be corrected. 



42 Experimental Results 

Three steps of the adaptation algorithm are required to search the feasible 
activation values when the branch span is extended to 101 centimeters. The transition 
of variations of r^, through the adaptation are shwon in Fig. 8, 

and table. 1. Figure 9 shows the trajectories of tip of the free hand both before and 
after adaptations. 

When we choose ±0.1 as Ar, we can get the feasible brachiation motion against 
task change, the parameters such as ar,, are also converged. These 

parameters are improved when we set Ar with ±0.05 and ±0.4 . 
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5 Conclusions 

In this paper, we proposed the novel adaptation method to adjust behavior 
coordinator against small changes of the objective task. This method measures the 
effects of the each behavior to the total behavior, and determines the activation values 
in the behavior coordinator for each behavior controller based on the trials. We 
applied this adaptation method to control problem of the Brachiator III. We showed 
the effectiveness of the proposed algorithm, which could find the feasible activation 
values through several trials against small task change. The performance of this 
method depends on the amplitude of perturbation, which is difficult to be adjusted 
because of the non-linearity of robot behavior and because of measuring error. 
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Fig. 8 In the figure (a), denotes the distance from free hand to the branch of goal, and 
A r. , denotes the distance to the x direction from free hand to the center of the branch. In the 

•^hand 

figure (b), as^^^ denotes the error of ^ from the target angle. 



Table 1. Transitions of m., a and ar,^ in the adaptaion process 
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Fig. 9 Free hand’s trajectories between before and after adaptation 
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Abstract. The paper gives probably the first ever systematic discussion 
on how wave processes in reaction-diffusion and excitable homogeneous 
media can be efficiently used to solve a wide rcinge of problems in ro- 
bot navigation. Three possible applications of chemical controllers are 
considered; (i) object following, (ii) optimal path finding, and (iii) uni- 
versal control. The various implementations of controllers discussed here 
include; Belousov-Zhabotinsky chemiccd processors, families of excitable 
lattices, and self-loccilised excitations. We present some results from a 
simulation of robot control using excitable lattices, and find the results 
encouraging for our planned construction of a chemically controlled ro- 
bot. 



1 Background 

One remarkable aspect of life is that it presents innumerable instances of simple 
biological systems which can solve particular problems in the real world; many 
of these problems can also be set in a computational context. Artificial life has 
yielded many demonstrations of abstract systems which also exhibit these prop- 
erties in simulation; our focus here is on building real physical systems which 
instantiate these abstract systems in a situations close to the biological source of 
inspiration — using them to control a freely moving physical agent in a physical 
environment. Our chosen technical route is chemical information processing. 

Chemical information processing is attractive from many points of view: to 
some people, it offers fine grained massive parallelism; to others, the appeal is in 
the potential robustness of the technique, or in the connection to computation in 
biological structures. Our interest is driven by the possible relationship between 
the problems which one form of chemical computation may excel at solving, and 
the problems which need to be solved by agents moving around in the real world. 
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Chemical computation is of two basic types. The first occurs within domains in 
which spatial differentiation is eliminated (for example by stirring); the essence 
of the computation is in the appearance and development of linked reactions and 
compounds in which certain characteristics of identity and concentration can be 
identified as the outcome of useful computation over some problem defined by 
the initial state and external inputs. The second type exploits local changes 
and spatial differentiation, using unstirred reactors and typically employing a 
rather simpler basic chemistry than the first type. This intrinsic connection to 
locally connected structures maps well onto many of the problems which a mobile 
robot faces when moving around an unstructured environment: two and three 
dimensional representations must be processed, and movement in two or three 
dimensions must be controlled and directed. Excitable media have been studied 
for some time in the context of the possible realization of basic computational 
operations in chemical active media. For example, it has been demonstrated that 
universal logical gates can be realized in compartmentalized chemical media [9], 
and that many basic forms of image processing can be executed in parallel in 
such chemical computing devices ([13] - [16]). Universal logical gates constructed 
from chemicals held in and connected by combinations of tubes and valves are 
sophisticated examples of chemical computing elements (see e.g. [21]). The non- 
stirred active chemical media exhibit a wide spectrum of dynamic behavior, 
which could be extremely useful from a practical point of view ([3], [2], [1]). 

Of the family of chemical oscillators, the Belousov-Zhabotinsky (BZ) reac- 
tion is the most investigated, and the most suitable for laboratory experiments. 
The non-stirred layer of an oscillating BZ-reaction can be considered as a mas- 
sively parallel processor, where every elementary processor is represented by a 
micro-volume reactor. The state of the micro-volume can be identified with the 
reduced/oxidized state of the bromate component. Information processing media 
of the BZ type are specialized processors where the statement of the problem, the 
computational processes, and the results of the computations are all represented 
in the states (concentrations etc.) of the micro-volumes. The evolution over time 
of such a processor leads to a spatio-temporal dissipative structure that can be 
interpreted as the solution of the problem. The computation in a BZ processor 
takes place when waves are generated, c«id spread and interact with each other, 
and perhaps with external representations of information (e.g. light patterns). 

The known abilities of excitable media for more complex image analysis [16] 
and for the solution of spatially based problems (e.g. shortest paths - [19], [21]; 
[14]) create the expectation that this experimental arrangement will be capable of 
demonstrating more sophisticated and complex robot behaviours such as object 
following, obstacle avoidance, and optimal path finding. 

2 Object following and obstacle avoidance: System 
requirements and results of simulations 

In order to make a chemically controlled robot which can follow some moving 
target without requiring an umbilical cable, it is necessary to accept a number 
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of practical constraints. The chemical reactor (or other material realisation of 
the excitable medium) must be fitted on board the mobile robot. Local environ- 
mental information must be sensed onboard. Non-local information (if necessary) 
must either be gathered and pre-processed by an offboard vision system and com- 
municated to the robot via a high-bandwidth radio link, or must be provided 
by an onboard system using optical projection. The environmental information 
will be represented to the robot as patterns of light projected onto the reaction 
medium; the light patterns initiate and modulate waves in the medium. The 
patterns formed by the waves will be sensed and processed by an optical system, 
and the output will be mapped to the robot’s motion controller. 




Fig. 1. The snapshot of the simulation. The left, the middle aind the right left con- 
figurations represent lattices from the useless, adaptive and non-adaptive classes, re- 
spectively. The initial positions of the robots and their approximate trajectories are 
shown. 



We can make a preliminary evaluation of the potential of such a system by 
investigating a model using computer simulation. Let us consider an onboard 
array L of excitable cells with an 8-cell neighbourhood such that for every cell 
a; e L the neighbourhood template is defined as u{x) ~ {(^t.^j) € {—1,0, 1} x 
{ — 1, 0, 1} — (0, 0)}.The cells take three states; rest (o), refractory (— ) and excited 
(-}-), and change their states in discrete time by the rule: = -f if a;* = o 

and C(ar)*; = — if = -b; = o otherwise. The predicate C(x)‘ is true 

with probability p (or p* ) if x lies on the boundary of the array, or if the number 
e(x)‘ of excited neighbours, e(x)* = |{J G «(«) : (x -f- ^)* = -f } | , lies in some 
specified interval of the excitation, e(x)' G [^ 1 ,^ 2 ], 1 < < ^2 < We should 
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note that all inner cells of the lattice are excited deterministically, and so noise 
cam influence only the edge cells. 

The edge cells of the lattice are subject to stochastic activity: every cell 
becomes excited without any external information-bearing stimulus with proba- 
bility p at any discrete time step. If some parts of the edge are illuminated (i.e. 
receive information) the cells belonging to these parts are excited with a higher 
probability p* , 0 < p <^p* < 1; the edge cells which are in shadow become ex- 
cited with probability p. Once generated at the edges, activity patterns spread 
on the array. 

Let the environment consist of an open arena containing a single light source; 
we require the robot to follow the light source as it moves. The robot is assumed 
to move at a constant speed; the controller may vary only the direction of motion. 
The relative position of the light source can be derived from two successive 
configurations of array activity in the following manner. Every site x of the array 
is assigned a vector v^. which is updated at every step t of the temporal evolution 
of activity: uj. = jTfsyrj 2iee(x)‘ vector thus indicates the direction with 

the highest density of excited neighbours. The vector is not updated if there 
are no excited neighbours. The integral vector v*, u* = indicating 

the relative position of the light source can be transformed to a unit vector, 
t;* = («i , ^ 2 ) ^ {sgiv{),sg{v^ 2 ))i where sg{a) = -1 if a < 0, it equals 1 if a > 0 
and it is zero otherwise. The value of determines the change of direction 
of the robot. (In a wheeled robot, it may control the steering motor, or the 
relative speeds of wheel rotation in a differential-drive system.) If every cell of 
the excitable array has its own effector, then we do not need to calculate the 
integral vector; the motion of the robot toward the source of light will be the 
result of the summed activity of the collective of effectors. 

In simulation experiments (Figure 1), several situations of interest were in- 
vestigated. As will be described later, not all parametric selections gave good 
results, but the behaviour of the better performers was certainly encouraging. 
It was demonstrated that in the absence of a light source the robot can wander 
apparently at random, implementing a search of the environment due to the 
small stochastic activity of the edge cells; it is perhaps worth noting that this 
exploration in the absence of light can lead to the discovery of a source which 
is obscured by some obstacle. This functionality is often deliberately added to 
small behaviour based robots. A single source of light can certainly attract the 
robot. The situation becomes more complex and interesting when there are sev- 
eral sources of light. If p’ is constant then, depending on the initial position of 
the robot, the robot may wander between the sources, or be attracted to sin- 
gle source. (It is worth comparing the abilities of this simple simulation with 
the abilities of the earliest simple autonomous robots ([23] and [10]). Making 
p* proportional to the intensity of stimulation apparently enables the robot to 
choose the target with the maximum intensity of light; additionally, if all targets 
are the same luminance, a single target will be apparently be chosen. (This is a 
consequence of the stochasticity of the edge cells. 
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We have analysed the behaviour of simulated robots with onboard excitable 
lattices for all possible intervals [6\ , d^] of cell excitations. It is possible to subdi- 
vide the rule space into four basic classes: (i) useless: unsuitable for navigation; 
robots do not approach a static target at all, interval [1,2], Figure 2, A; (ii) not 
adaptive: robots initially approach a static target but then overshoot and col- 
lide with the boundaries), intervals [1, 4] • • •[!, 8] and [2, 4] • • • [2, 8] , Figure 2, 
B; (iii) adaptive: robots successfully follow the target even if it moves, intervals 
[1,1], [2,2] and [2,3] , Figure 2, C; (iv) clumsy: robots can approach a target 
but do so in a very nonlinear manner, making wide excursions and oscillations 
interval [1,3], Figure 2, D. 

The suitability of the excitation rules (mccisured in distance-to-target units) 
is shown in Figure 2. Does the performance of the robot controllers as a function 
of the interval of excitation support a classification of the excitation rules which 
fits the morphological classification [3] invented by us earlier? This is true to 
some extent. The class of computationally universal rules ([2,2] interval) join 
the unrelated rules which are responsible for chaos-like activity patterns to form 
the adaptive following class; but all rules that generate one or another kind 
of waves fall into the class of non-adaptive following. The complete absence or 
extreme instability of generators of excitation in all members [5] of the adaptive 
following class are the feature that explains the good behaviour of the robots in 
this class. 

As we have already discussed in our previous papers ([2] and [3]) there are no 
persistent activity patterns in excitable lattices when 0i > 2. In the arrangement 
discussed here, where the lattice is onboard and stimulated by light, only the 
cells near edges of the lattices can be excited. A cumulative effect plays a key role 
in this situation: the robot initially chooses the appropriate direction towards 
the target, but continues that course and passes the target because it cannot 
change direction until a sufficient number of new vectors have been generated to 
overcome the effects of the old vectors. (If this does not happen quickly enough, 
it may result in a collision with one of the arena boundaries. However, even while 
stationary it still responds to the illumination from the target, slowly updates 
the local vectors and then starts to move towards the target again (Figure 2 

(E)). 

It is clear from these simulations that we badly need a reset button to 
better utilize the wave processes which occur in some important families of ex- 
citable media; for the light-sensitive wave generating BZ reaction a reset can be 
achieved by illuminating the medium by white light for 15-30 sec [14]. Let us 
look at what happens when a reset operation is used. In the experiments with 
interval excitable media we used a periodic reset, i.e. all cells of the array are 
forced into rest states every nth step of the evolution, where n is the linear size 
of the array. Under this regime, the behaviour of the robot becomes surprisingly 
uniform for excitation rules - all follow the target adaptively. In all cases the 
robot reaches the vicinity of the target and then wanders near it (Figure 2(F)). 
Even if generators of waves are formed, they will be eliminated by the reset 
operation. 
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Fig. 2. Perform 2 ince (meMured in distance to target) of the robots controlled by ex- 
citable lattice of 50 x 50 cells, p* = 0.1. A. Useless class. Single run for 9\ = \ and 
#2 = 2. B. Not adaptive following. Single runs for [1, 4] • • • [1, 8] and [2, 4] • - • [2, 8]. C. 
Adaptive following. Single run for [1,1], [2,2] and [2,3]. D. Clumsy behaviour. Single 
run for [1, 3]. E. Single run for 0 \ = 3 and 9 ^ = 4. F. Effects of reset operation: single 
runs for 9i 6 [1..2], 92 € [1..8], 92 >9\. Reset operation is applied every 50th step of 
the simulation time. Two outstanding peaks correspond to [2, 2] and [2,8] intervals of 
excitation. 
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In evaluating the performance of the simulated robot, it might be useful 
to bear in mind the abilities of other, perhaps simpler architectures capable 
of approaching or following a source; for example, Braitenberg’s vehicles 2b, 3a, 
and 3b are obvious candidates [7]. However, our real emphasis here is on showing 
that the simulated robot is at least capable of following a source, rather than on 
showing that it does so with any relatively high level of performance, or relative 
economy of means. 



3 Optimal path finding 



The problem is how to reach a specified destination without colliding with ob- 
stacles and with minimum waste of energy (e.g. in minimal time). We take as 
granted that the whole configuration of the arena is represented in the reac- 
tion space by inducing deviations of the media characteristics from the average. 
There are many ways to draw pictures with inhomogeneities in a homogeneous 
medium. The most common ones are the (i) physical compartmentalization of 
the medium (e.g. construction of impenetrable walls to represent a labyrinth), 
(ii) projection of the data configuration using light with certain characteristics, 
and (iii) injection of another reagent into the sites corresponding to the data 
configuration. In fact, three key problems of computational geometry strongly 
relate to optimal path finding: (i) approximation of the shortest path; the search 
for the exit out of a labyrinth is just another interpretation of this problem; 
(ii) approximation of obstacle free paths; this is computed in the form of the 
Voronoi diagram of the obstacles; (iii) approximation of the tree connecting all 
possible destinations; this is computed in the form of the skeleton of the planar 
polygon vertices, any of which may be possible destinations of the robot. 

The Babloyantz-Sepulchre algorithm [6] is the first simulation based work 
that demonstrated ten years ago the potential of excitable media for some com- 
binatorial optimization problems. In their paper the physical environment is 
mapped onto an octagonal network of oscillators; obstacless are represented by 
inactive oscillators and the waves are generated at the destination point. The ro- 
bot moves towards the locally highest value of one of the concentration variables. 
The viability of the idea was demonstrated in laboratory experiments ([19]). 

In recent experiments [14] a new approach, an alternative to the physically 
inhomogeneous representation of the environment structure, has been invented. 
It has been demonstrated that we can map the configuration of an olMtacie onto 
the reaction space without even disturbing the structure of the space. To upload 
the topology of a labyrinth into the reaction medium it is enough to expose the 
reaction layer to light radiation, the distribution of intensity of which determines 
the initial image of the labyrinth. Moreover, we can use phase waves instead of 
trigger waves. Phase waves are independent of diffusion, and propagate along 
the phase gradients in an oscillatory medium [14]. Phase waves are faster than 
trigger waves. It is also possible to explore the so-called light-sensitive phase 
waves [14]: the evolution of the positive/negative image depends on the intensity 
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of the superimposed background,, and so the phase wave virtually travels along 
the corridors of the labyrinth. 

It is possible to build processors for the approximation of the Voronoi diagram 
and the skeleton which are based on entirely chemical inputs and optical outputs 
(see e.g. [20], [4]). The Voronoi diagram of a given planar finite set P is defined 
as a partition of the plane into convex regions, one per element of the set; every 
convex region is assigned to some element p € P of the given set and contains 
all points of the space that are closer to p than to any other element of P. The 
edges of these convex regions form the Voronoi diagram. A skeleton of a planar 
shape is a set of centers of bitangent circles lying entirely inside the contour 
of the shape. After testing numerous possibilities we have found two ways to 
approximate both structures; both methods can be done onboard while robot 
is moving. The first chemical controller is made of an agar-palladium thin layer 
with potassium iodide liquid diffusing in it; the second one is based on the 
formation of Prussian blue on an agar film. In both controllers the result of the 
computation is represented by the absence of precipitate, and, as usual, it may 
be detected optically. 



4 Universal controllers and self-localised excitations 



By universal controllers we mean the families of nonlinear media in which uni- 
versal logical gates can be realised; in theory, they will enable us to use any 
comprehend computational schemes for navigation. The easiest and most prac- 
tical way to build a universal computer in an excitable medium is to use the 
paradigm of the billiard ball model developed in [8]; quanta of information are 
represented by self-localised excitations, or particle like waves, that travel in 
space and perform computation by interacting with other traveling signals [1]; 
the idea is orthogonal to conventional techniques of constructing logical gates in 
chemical computers, which involve constraining reactions in tubes (see e.g. [21]). 




Fig. 3. The principal scheme of the control of a robot by an excitable medium using 
self- localized excitations. Particle like waves are induced by light stimulation of the 
sensors xl and xr. They travel toward motor units y/, and yn and transmit energy 
(excitation) to them. If two waves collide they disappear. So, functions yr, = rn A rp 
and yn = Tr A xl are realised. Assuming constant speed of the wheels of the wheeled 
robot model without any extemaJ stimulation, and incrementing the these speeds by 
the effects of the particle like waves, we see that the robot changes the direction of 
movement adaptively, adjusting it to the current direction of the light target. 
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The basics of universal computation in real and simulated excitable media, 
together with possible real implementations are discussed in [1]; here, due to lack 
of space, we can only describe a minimal scheme where self-localised excitations 
do not merely transmit signals from the sensors to the effector units of the robot 
but also interact to adjust in the orientation of the robot to alight target (Figure 
3). 

5 Further steps 

We have found that a range of excitable media, independently of their chemi- 
cal or structural origin, exhibit the types of activity which have the potential 
to be applied successfully to the problems of robot navigation. Are there any 
other suitable materials and prototypes that can be also used for the same task? 
The printed excitable media [18] are one possibility. The catalyst of the BZ 
reaction is printed on a polysulphone membrane with an ink jet printer; there- 
fore, virtually any pattern of catalyst can be printed. There is a possibility 
that discreteness- induced self-localized excitation can be produced by printing 
a lattice, for example. Scheibe aggregates [12] are monolayers that are present 
in nature in a wide spectrum of photochemical reactions; they can be build 
as Langumir-Blodgett films in laboratory conditions. These aggregates posses 
unique transfer properties, namely excitons, which are induced by light, and 
which travel along the two-dimensional array of molecules like particles (com- 
pare with the universal controllers). Worms, or localised structures in binary 
mixtures of Rayleigh-Bernard convection [22] , form another field of opportuni- 
ties. Depending on their particular states, the worms can stop, continue their 
motion undisturbed, or selectively disappear as a result of binary collision [22]. 
Other subjects were sorrowfully left outside the paper. They are metabolic con- 
trollers of robot motion [24], and one-dimensional nonlinear media, especially 
particle machines (e.g. [17] and [11]). 

6 Conclusions 

Our review of the abilities of chemical systems to solve certain spatial problems 
has indicated the theoretical possibility of using those systems for the control 
of a robot. Our simulations of a robot controlled in such a way show that the 
basic navigational requirements are apparently easy to achieve; we have also 
noticed a useful relationship between the observed patterns of behaviour, and 
our existing analysis of parametric descriptions of activity in excitable media. 
We are continuing this work by undertaking a feasibility study to investigate the 
construction and experimental assessment of a physical robot using this type of 
chemical control. 
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Abstract. In this paper, we present a neural architecture for a mobile 
robot in order to lecim how to imitate a sequence of actions. We show 
that the use of a representation of the information in a continuous and 
dynamic way is necessary and the use of the neural fields can be a good 
solution to control the dynamic of several degrees of freedom with a 
single internal representation. 



1 Introduction 

Until now, our work has been mainly focused on the design of a neural net- 
work architecture (named PerAc: Perception-Action) for the control of a visu- 
ally guided autonomous robot. However, the PerAc architecture does not help 
to solve problems which have an intrinsic high dimension. Therefore imitation of 
already learned behaviors or subparts of a behavior not completely discovered is 
certainly one way to allow a population of animals or robots to learn and to find 
solutions by themselves. Learning by imitation is already used in a few projects 
of Artificial Intelligence (see [2,3,5]). In our previous work [6], we proposed a 
neural architecture for imitation based on visual information and we shown how 
to use it to teach the robot to perform a particular sequence of movements (to 
make a zigzag trajectory, a square ...). In this paper we try to put together 2 
ideas: how a PerAc architecture can be used for learning by imitation and how 
the properties of the neural fields can be used to improve the motor control. 

2 Neural network for sequence imitation 

For the imitation behavior, we start with the assumption that proto imitation 
(not intentioned imitation) is triggered by a perception error (see [6] for de- 
tails) and in Fig. 1 we present an overview of a general PerAc architecture using 
this principle. The reflex path of PerAc works as a movement tracking mecha- 
nism which consists in going towards any perceived movement. The second level 
of the architecture learns the temporal interval between the successive robot 
orientations (i. e. a sequence of movements), and associates it to a particular 
motivation. 
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Fig. 1. A general diagram of the PerAc arcliitecture use for learning the temporal 
aspects of a trajectory. CCD - CCD camera, M - Motivations, MI - Movement Input, 
MO - Motor Output, TD - Time Derivator, TB - time batterj', PO - Prediction Output 



A frame-grabber is used to take a sequence of images. In one of our simplest 
implementation, a “movement image” is the difference between 2 different time 
integrated images of the above sequence. The perceived movement orientation 
is computed from the “movement image”. The result is one-to-one “projected” 
on a map of analog formal neurons, the Motor Input (MI) group in Fig. 1. 
To avoid the perception errors in the tracking mechanism, we allow the robot 
camera (robot head) to rotate. In this way, the head tries to pursuit the teacher 
at any time by centering it in its visual field. The robot body turns only if the 
teacher movement is observed under the same angle for a given time interval. 
The independent rotation of the robot head and its body can be viewed as a 
simple two degrees of freedom system. The functioning of the motor group (MO) 
is quite simple. At each step, a WTA mechanism chooses the most activated 
neuron, performs the rotation corresponding to this neuron and finishes with a 
fixed translation. The MO group uses the same information representation as the 
MI group. It receives the information from both reflex level and event prediction 
level. 

In order to learn a sequence, the student robot detects and learns the tran- 
sitions in its own body orientation and to be able to reproduce them. The 
movement rotations characterized by OFF-ON transitions (Time Derivative TD 
group) of MO neurons are used as input information for a bank of spectral neu- 
rons (TB in Fig. 1). Time filter batteries (TB) act as delay neurons endowed 
with different time constants. As such, they perform a spectral decomposition 
of the signal that will allow the neurons in the Prediction Output group (PO) 
to store the transition patterns between two events in the sequence. Finally, the 
PO group is linked with the MO group via one-to-all modifiable links. 

3 An neural dynamics of the motor system 

The first limitation in our architecture is the poor stability of the tracking be- 
havior. Even if the temporal integration allows a memory effect, any new input 
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stimulus can generate an immediate change of the head orientation (a classical 
WTA decision). A second major limitation is the input discrimination. Two or 
more movement zones can be interpreted as different targets or as the same 
target due to perception error. In the present system, no interpretation of the 
perceived movement is performed in order to avoid a misinterpretation. The mo- 
tor group has to be a topological map of neurons using a dynamical integration 
of the input information to avoid forgetting the previously tracked target. A 
dynamical competition has also to be used to avoid intermittent switchings from 
a given target to another. 

We will use the simplified formulation of the neural field proposed and studied 
by Amari [1], 



T ■ = -f {x,t) + I(x,i) + h+ f w{z) ■ g{f{x - z,t))dz (1) 

Without inputs, the homogeneous pattern of the neural field, f (x,t) = h, is 
stable. The inputs of the system, / {x,t), represent the stimuli information which 
excite the different regions of the neural field and r is the relaxation rate of the 
system. w{z) is the interaction kernel in the neural field activation. These lateral 
interactions (“excitatory” and “inhibitory”) are modeled by a DOG function. 
Vx is the lateral interaction interval, g (f (x,t)) is the activity of the neuron x 
according to its potential / We use a classic ramp function. 

G. Schbner [7,4] has proposed to use the properties of the neural field for 
motor control problems. The “read-out” mechanism consists in the use of the 
derivate of the neural field activation to compute the motor command. The 
orientation of the robot head, ^rob, relative to a fixed reference is used in the 
system as a behavioral variable. The state of the system is expressed as a value 
of this variable. The local maxima of the neural field are named attractors. If 
the target orientation is <j)tar (see Fig. 2, a), it erects an attractor in the neural 
field (see Fig. 2, b) and the robot rotation speed will be w = = F{ci>rob)- ^ 

is a function of the current robot orientation, <j>rob- It sets the dynamics of our 
robot. 

Taken separately, each input erects an attractor in the neural field. The 
Amari's equation allows the cooperation for coherent inputs associated with 
different goals (spatially separated targets). For closely spaced input information, 
the dynamic has a single attractor corresponding to the average of the input 
information. For a critical distance between inputs, a bifurcation point appears 
and the previous attractor becomes a repellor and 2 new attractors emerge. 
Depending on the initial state, the robot switches to one of the 2 new fixed points. 
This mechanism of input competition / cooperation has an hysteresis properties 
which avoids oscillations between the two possible behaviors. Another feature of 
the neural field is the memory. If the parameter h in Eq. (1) has a sufficiently 
negative value then the neural field operates with a memory effect in which a peak 
of an attractor has been maintained for a short time interval. A large positive 
value of h determines a supra-threshold in the neural field activation. We use 
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a) 




Fig. 2. a) The robot cind the tcirget coordinates aire represented in the same reference. 
The reference orientation, (f>o is used to compute (j>Tob and <j>tar- b) The target position 
erects an attractor at (f>tar- The “read-out” mechanism allows to compute the rotation 
speed w using the derivate of the neimal field activation. 



the inputs of the actual system to drive a motor command using a neural field 
without any modification. Replacing the MO group by a neural field is the sole 
modification in the architecture (see Fig. 1). All above properties of the neural 
field come into the general architecture, eliminating the input segmentation and 
the stability problem of the initial architecture. 

4 Experimental results and discussion 

At first, we have implemented the tracking reflex using only one degree of free- 
dom, i. e. the robot moves only its head. In order to demonstrate the capabilities 
of neural field to control several degrees of freedom we take a simple example. 
The robot follows a “teacher” and learns a sequence of movements ABC. The 
sequence starts with the activation of the state A (orientation) corresponding 
neuron. The input in the neural field generates an attractor at the the (f>A ori- 
entation (see Fig. 3). 

At r moment, the neuron will be activated by the PO group. This ac- 
tivation shifts the attractor to <j>B in the neural field. Using the “read-out” 
mechanisms, we obtain 2 rates of orientation change (due to differences inertia); 
one for the head orientation and another one for the robot body orientation. In 
the top of the Fig. 3, we show the variation of head and body orientation as a 
function of time. According to neural field dynamics, the change of the orien- 
tation is continuous. For an external observer, the head orientation anticipates 
the body orientation ( i.e. the inertia of the robot is learned too). 

This work is at its beginning. Its interest is in its use of the neural field con- 
cept in a PerAc architecture. We show that we can learn the temporal sequence 
of movements by imitation using a PerAc architecture. The tracking mechanism 
in the reflex path of PerAc permits the temporal “segmentation” of the “teacher” 
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A. — ^ ' B • ■ — — ' ' — — O <sftquence> 

Fig. 3. Top: the temporal variation of the head md of the body orientation. Bottom; 
the neural field activation for an ABC sequence. The bar represents the predicted 
movement. 



movements without learning to visualize what the teacher is doing or not. The 
use of the neural field improves the stability of the proto imitation process and 
permit the discrimination of moving objets in the visual perception field. 
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Abstract. We present a neural model for the control of an animat. The 
model is based on two structiues. The first one enables visual navigation 
using landmarks. It may be used in unknown and changing environments. 

The second structure enables building a proximity map of the environ- 
ment. Using this map, an animat may successfully reach different goals 
linked to different motivations and solve various types of action selection 
problems. 



1 Introduction 

Taking inspiration from neurobiology and psychology, we develop artificial neu- 
ral network models for the navigation of a mobile robot. Our research is driven 
by three different influences. First, our goal is to build a system which allows a 
robot to “behave” autonomously in an a priori unknown environment (animat 
approach). Second, we also base our work on a constructivist approach. The 
robot perception is primarily based on vision through a CCD camera. We do 
not provide any explicit information about the external world structure. The 
integration of sensory information allows to create representations of the exter- 
nal world which are based on real data (symbol grounding problem). Finally 
our third motivation is to validate implications of neurobiological models, and 
eventually to propose some new hypotheses to explore. We pretend that this 
approach may boost the learning approach to autonomous robots. 

We will first describe our model and then report on our results about the se- 
lection between different goals. These results are computer simulations, but we 
emphasize that part of this work is already running on a real robot experiment. 



2 Model 

We have developed a model explaining some capabilities enabling to return to a 
learned place from unexplored locations in the same environment [4,3]. A neu- 
robiological inspiration is the finding of “place cells” in the rat hippocampus 
[5]. These cells fire when the rat is at a particular location in an area. In our 
architecture, each “place cell” (or learned location) is coded by a set of couples 
(landmark, azimuth). We don’t provide any external explicit description of the 
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environment (world model), nor learn what to do for each location in the envi- 
ronment. The recognition of a learned location is based on the recognition of the 
landmark configuration. 

When the visual environment of the goal cannot be perceived because of huge 
obstacles, it becomes impossible to only use place recognition, since the robot 
always tries to make shortcuts and goes directly to the goal. A special reflex 
allows however for obstacle following. Yet, the best way would be to learn to make 
a detour around an obstacle, instead of going towards it, and afterwards follow 
it (fig, 1). The same drawback appears in an environment with several rooms. 
As it is, the animat cannot go from one room to the other directly. We need to 
introduce a kind of map of the environment to be able to perform this task. To 
represent this map, we add a group of neurons able to learn the relationships 
between successively explored places. The temporal proximity being equivalent 
to a spatial proximity, the system creates a topological representation of its 
environment. We will call this last group our “cognitive map” (or goal map) (fig, 
1) [7], The activation of a motivation induces the activation of a goal in the map 
which is linked to the satisfaction of this motivation. For instance, the “eating” 
motivation activates the goals where the animat has discovered food. Then, this 
activity diffuses on the map beginning from the goals, and activates place cells 
according to their distance (in number of links) to the goal. First, the animat 
tries to follow the gradient of neuron activity in this cognitive map to select the 
next location to reach. The most activated goal or subgoal in the neighborhood of 
the current animat location is then selected and used to attract the animat in its 
vicinity. When the animat is close enough to this location, the associated subgoal 
is inhibited and the animat is attracted by the next subgoal and so on until it 
reaches the final goal. The algorithm is proved to always find the shortest path 
in the graph (it is equivalent to the Bellman- Ford algorithm of graph exploration 
[2]). The principle of this kind of cognitive maps is not new. The novelty in this 
paper is that our algorithm allows to solve planning problems involving several 
moving goals in a dynamic environment (the sources may disappear and appear 
again elsewhere, obstacles may be moved and landmarks hidden). The map is 
learned and modified on-line and allows to manage contradictory goals. The 
subgoals correspond to the following situations: the end of an obstacle avoidance 
(for instance, the animat stores the pathway between two rooms) or places badly 
recognized. Learning the cognitive map is performed continuously. There is no 
separation between the learning and the test phases. The links between neurons 
of the graph are reinforced (hebbian associative learning) for neurons associated 
with successively recognized places. Let W,j be the weight associated with the 
fact that from the place i it is possible to reach directly place j, its learning rule 
is the following: 



^ = -X.Wij + {C+ f ).(1 - Wij).Pot*.Pot*j (1) 

where Pot* is the value of place cell i. Pot^ must be held to a non null 
value until Pot* (with i ^ j) is activated by the recognition of the place cell j. 
This is performed by a time integration of the Pot* values represented in the 
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Fig. 1. On the left, reflex behavior of obstacle following for reaching the gocil (initial 
speed towards the upper left. The animat htis a mass giving a momentum). The dots 
indicate the trajectory followed by the cinimat. On the right, cognitive map built by 
exploration of the same environment. The landmarks are the crosses on the border. 
Each circle is a subgoal. The links indicate that the two subgoals have been activated 
in succession. The subgoals cind the learned transitions form the goal map. 



equation by Pot* . Pot* decreases with time and can be used as a raw measure of 
the distance between i and j. A is a very low positive value. It allows forgetting 
unused links. C = 1 in our simulations. The term ^ corresponds to the variation 
of an external reinforcement signal (negative or positive) that appears when the 
animat enters or leaves a “difficult” or “dangerous” area. 

The way it is constructed, the planning map fits very nicely in the “neural 
field” framework [1]. Indeed, one can consider that the goal has a neural field 
extending over each subgoal. The same way it is possible to control the heading 
direction of a robot [6] by controlling its angular velocity, we control the position 
of the animat through modifications of its speed introduced by the gradient. 
Roughly speaking, the speed is a function of the distance to the goal. Hence the 
planning map not only indicates where the goal is, but also enables to control 
how to reach it (top-down process). 

3 Learning to choose between goals 

We will work on the following experiment. The animat has two different energy 
levels called “food” and “water” . They are linked with two motivations “hunger” 
and “thirst” . As the animat moves, the two energy levels decrease. Through ex- 
ploration, it has to discover “food” and “water” places where it may re-supply 
one of the two levels. The animat performs a random exploration of the envi- 
ronment in order to discover the resource places. However, we want it to go back 
to a previously discovered interesting place when it needs to. The animat must 
therefore solve the dilemma between exploring and reaching a known food and/or 
water place. In the simulated experiment, the perception of the animat is limited 
to landmarks and obstacles. The supply level of a resource place is decreased by 
the amount taken by the animat. When the source is empty, an other one ap- 
pears randomly in the environment. Hence, the animat always has to explore 
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the environment in order to find new potential sources. The environment is a 
T-maze. The animat already knows where the sources are. The cognitive map 
follows the middle of the corridors, and does not introduce any a priori bias 
in its use. The maze and the cognitive map are displayed as the first figure of 
each experiment shown below it. In addition to food and water, there is a nest. 
The need to go to the nest increases twice as fast as the need for food or water. 
Hence the animat has to go back very often to its nest which is in the bottom 
arm. In the right arm there is only water whereas in the left arm the animat 
can find both water and food. The difference between the two experiments is the 
distance between the food source and the right water source. In the first part 
of the experiment, where the food source is only slightly shifted to the left, the 
preferred water source remains the right one. But as it is shifted more to the 
left, it is the right water source which eventually is the most used one. Because 
the right water source is nearer to the nest than the left one, the animat goes in 
the right arm to drink. When it gets hungry, it goes in the left arm to find the 
food. Since it is the only food supply, the path between the nest and this source 
is reinforced very often. A large part of this path is also common to the way 
to the left and right water sources. The animat has to go more often in the left 
arm to go to eat, so when it goes exploring, it is more likely (in fact has more 
time) to go in the left end than in the right one which is farther. Hence the links 
between the food source and the left water source are reinforced. So when the 
animat is thirsty while being near the food source, it goes to the left food source. 
And at last, as the path between the nest and the left food and water sources 
is being reinforced, even when the animat is near its nest, it goes to the left for 
drinking. Only the occasional exploration of the right arm may reactivate the 
links towards the right water source. But the process only begins at the same 
initial point again, leading eventually to the reinforcement of the links in the left 
arm. The animat succeeds the task in the absence of any explicit reinforcement 
unlike what must be done with Q-learning for instance (which also seems to have 
a lot of problems to work on a problem with such a number of dimensions - at 
least 6 continuous dimensions: x, y, time, and 3 motivations!). 



4 Discussion 

The navigation and planification system we have presented is able to solve com- 
plex action selection problems. However, for the moment, it has to really perform 
the movements in order to reinforce particular paths. An improvement would be 
the possibility to internally replay the trajectory used to reach a goal. 

The main drawback of the algorithm is the computation of the gradient of 
the neuron activity. Indeed, in environments with a great number of subgoals, 
it may be very small. Hence, if there is a small noise on the gradient, it would 
be now impossible to follow it. So there is a need to combine several maps of 
different joint environments. This means we have to define a planning structure, 
or plans of plans, in order to address large scale environments. 
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Fig. 2. The figures on the top display the initial mazes. Below, the height of each 
square shows the number of time it h 2 is been occupied by the animat (dcirk (resp. 
light) color indicates low {resp. high) occupation. The values have been computed 
adding 50 different runs of 20000 iterations. In the first experiment (left figure), the 
animat visits the right source more often than the left one. In the second experiment, 
the water source on the left, near the food source, is more often used than the water 
source in the right arm. 



This work was supported by a French GIS contract on Cognitive Sciences enti- 
tled “comparison of control architectures for the problem of the action selection” in 

collaboration with the Animat lab (J.Y. Donnart, A. Guillot, J.A. Meyer), LISC (G. 

Deffuant) and RFIA (F. Alexandre, H. Frezza). 

References 

1. S. Amari. Dynamics of pattern formation in lateral-inhibition type neural fields. 
Biological Cybernetics, 27:77-87, 1977. 

2. R.E. Bellman. On a routing problem. Quarterly of Applied Mathematics, {16):87-90, 
1958. 

3. R Gaussier, S, Lepretre, C. Joulain, A. Revel, M. Quoy, £ind J.R Banquet. Animal 
and robot learning: experiments and models about visual navigation. In Seventh 
European Workshop on Learning Robots, Edinburgh, 1998. 

4. P. Gaussier and S. Zrehen. Perac: A neured mchitecture to control artificial animals. 
Robotics and Autonomous Systems, 16(2-4):291-320, 1995. 

5. J. O’Keefe and N. Nadel. The hippocampus as a cognitive map. Clarendon Press, 
Oxford, 1978. 

6. G. Schoner, M. Dose, and C. Engels. Dynamics of behavior: theory and applications 
for autonomous robot mchitectures. Robotics and Autonomous System, 16(2-4):213- 
245, December 1995. 

7. E.C. Tolman. Cognitive maps in rats and men. The Psychological Review, 55(4), 
1948. 




Progressive Construction of Compound Behavior 
Controllers for Autonomous Robots Using Temporal 

Information* 



J. A. Becerra'^J;’Santos* and R. J. Duro^ 

' Dpto. Computacion, ^ Dpto. [ngenieh'a Industrial, Universidade da Coruna, Spain 
{ronin, santosl@dc.fi.udc.es . richard@udc.es 



Abstract, In this work we present a methodology for the progressive 
construction of compound behavior controllers for real autonomous robots. 
Some of these behaviors require temporal processing which is achieved through 
the inclusion of temporal delays in the synapses of the artificial neural networks 
used for their implementation. Starting from a set of simple behaviors 
implemented by means of evolved monolithic controllers, the evolution strategy 
employed obtains behaviors in higher levels either choosing the necessary low 
level behaviors from the previously selected set or through the coevolution of 
part of the low level behaviors and the higher level one. Emphasis is placed on 
making the behaviors robust and capable of performing in a real robot. 



1 Introduction 

When but a few behavior modules are required in order to implement a robot behavior 
controller, as Cliff et al. [1] point out, the complexity in a design scales with the 
number of possible interactions among modules. If the behaviors and their 
interconnections are designed by hand, two problems arise, one due, to the complexity 
problem mentioned and the other to the fact that the hand-designed behaviors are not 
necessarily the best, or, in some cases even adequate for the task. 

In the late eighties and early nineties artificial evolution was proposed as a means 
to automate the design procedure of these types of systems [1][2]. Many authors have 
taken up this issue and have developed different evolutionary mechanisms and 
strategies in order to obtain robotic controllers. 

When designing a complex behavior, two options are possible: a monolithic 
approach or a modular approach [3]. The advantage of the monolithic alternative is 
that it is not necessary to have prior knowledge about possible sub-behaviors and the 
interrelations between them. In hierarchical modular architectures [4][5] behaviors 
can be reused, thus allowing for the introduction of previously acquired competences 
and the individual modules are usually simpler. The problem that arises is that the 
behavioral decomposition is not clear in every case. 

In the work presented here, we have tried to combine, in a practical way, the 
monolithic approach and a hierarchical modular structure so that complex behaviors 
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could be generated automatically but took into account the experience accumulated 
through the implementation of previous behaviors. Specifically, using this method, a 
designer provides the system with whatever behaviors he has or decides that may be 
useful. This initial set need not be complete and may include many unnecessary 
behaviors. When obtaining the higher level controller, the evolution process will 
select those lower level behaviors from the initial set that are useful in order to 
perform the task assigned and will ignore the rest. If some part of the global behavior 
cannot be obtained through the interconnection of the available modules, a new 
monolithic module that handles this part will be co-evolved with the global controller. 



2 The Robot and the Simulation/Evolution Environment 

In the examples we present throughout this article, the robot employed is a “Rug 
Warrior”. It is a small (18.5 cm diameter), simple and cheap circular robot. It only has 
two DC motors, two binary infrared emitters and one receiver, two photosensors, a 
pyrosensor, three bumper sensors and two optical encoders. The sensors and actuators 
are very low quality, very noisy and imprecise, which is a plus when trying to obtain 
robust behaviors in the hardest possible conditions. The robot is untethered and large 
enough for it to be able to move around freely in a human environment, such as an 
office or a laboratory. This is a great advantage with respect to other robots often 
employed in evolutionary robotics experiments (such as the Khepera robot). 

Regarding the simulation/evolution environment, we have selected as basic 
evolutionary procedure an evolution strategy [6][7]. Some authors such as Salomon 
[8] have employed this alternative. The main advantage of this approach to evolution 
is that it aids in problems where the level of epistasis, and consequently, of 
deceptivity is high. In our case, the original structure of evolution strategies has been 
adapted to the development of ANN based robot controllers by uniformly distributing 
the initial population throughout the search space and subdividing it into races to 
prevent the clustering that arises when a random initial distribution of the low fitness 
individuals of small populations is employed. In addition, we have made use of a 
tournament selection scheme, adding a small probability of crossover (typically 0.1). 

The evolutionary algorithm can evolve a single low level behavior, a high level 
behavior that makes use of the low level behaviors we provide it with or co-evolve 
both simultaneously. The encoding scheme employed for the Neural Networks that 
implement the controllers, is just a direct genotypic representation of the phenotype in 
terms of synaptic weights and delays when appropriate. Regarding the fitness criteria 
we employed a global energy based fitness criteria. 

The simulation module is based on the Khepera simulator [9] and adapted to the 
Rug Warrior. The response of sensors and actuators employed in the simulator were 
obtained by measuring the real response of the real sensors and actuators of the robot 
under different conditions such as different levels of ambient light, etc. 

For a simulation to be appropriate for the transference of behaviors developed 
using it to the real world, it must meet some criteria, such as those established by 
Jakobi [10], which usually imply handling different levels and types of noise. In 
addition to the traditional random noise applied to values sensed and actuator 
commands, our simulations include the following three types of noise: 
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• Generalization noise. It consists in randomly changing some characteristic of the 
robot or the environment (such as the orientation of a sensor) and maintaining this 
change for one whole life of the robot. This is necessary, for instance, if the robot is 
working on battery power, as different levels of battery charge imply different 
speeds for the wheels of the robot when presented with the same command. 

• Systemic noise. A specially important type of noise is noise that has to do with 
defects or particular traits in the operation of the real robot. 

• Temporal noise. This type of noise is used in order to obtain behaviors that are 
tolerant to variations in the time elapsed between events. 

Obviously, noise makes obtaining a given behavior slower and more difficult. It 
may cause the environment to be perceived very differently in each evaluation of the 
robot, forcing the robot to obtain compromise conservative solutions. 



3 Constructing Compound Behaviors 

In this section we will consider the progressive construction of a global "homing" 
behavior. In the examples presented here, when temporal processing was required, we 
have made use temporal delays in all or part of its synapses. These delays are a crude 
model of the length of the synapses found in nature. 

Initially, we have 
developed four controllers 
for the robot, that is: 
monolithic wall following, 
co-evolved compound wall 
following using a escape 
from collisions module, 
monolithic homing and 
compound homing using all 
the previously evolved 
controllers. 

With regards to wall 
following, in figure l.a we 
display the operation of the robot controlled by a monolithic controller obtained in 
160 generations of an evolutionary process with 0.1 crossover probability and 4 races 
with 64 individuals each. The simulations 
included all the noise types commented before 
and the controller is an ANN with time delays 
in the synaptic connections of the first layer. 

The robot seems to oscillate very much in this 
behavior. This is because, as it must try to avoid 
crashing at all costs, it needs to explore its 
environment very closely. Even then, 
sometimes it will crash and either get stuck as 
shown in figure l.a or become unstuck with lots 
of friction and thus come out disoriented as 




Fig. 2. Homing behavior, a) Simulation 
b) Real robot. 




Fig. 1. Wall following behavior, a) Monolithic 
controller, b) Compound behavior (monolithic + escape 
from collisions), c) On the real robot. 
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shown in the top left corner of the same picture. In this second case, the robot lost its 
bearings after crashing and had to search for the wall all over again. 

If we consider the hierarchical approach where the designer only provided a 
escape from collisions module and allowed the other 
low level module to be co-evolved with the higher 
level wall following controller the robot oscillates a lot 
less (figure l.b), but crashes a little more often 
(without getting stuck). In figure l.c we present the 
operation of a real robot using the same controller. 

To obtain a monolithic homing controller we 
prepared an environment with no walls, facilitating the 
search for home (a flashing light), but with a trap that 
is very similar to home (static light). The robot will 
require temporal information in order to distinguish 
one object from the other. During the evaluation of the 
robot, both the trap and home are randomly positioned. 

The evolution took around 600 generations and each 
robot was evaluated 16 times each generation. An 
order of magnitude less generations are required if no temporal noise is used, but the 
results are not useful in real robots. The most difficult scenario in order to test the 
monolithic controller would be to have it confronted with a situation like the one 
presented in figure 2.a, where the static light (the trap) masks the flashing light 
(home) and thus, in order to go home, the robot must surround the trap without falling 
into it which is what the robot does in simulation. The correspondence with reality 
can be seen in the picture of figure 2.b. 

After obtaining the previous behaviors, we are now going to allow the evolution 
process to make use of previously acquired competences in order to obtain a more 
complex behavior through a higher level controller that selects lower level ones. The 
objective is the same as used for monolithic homing, that is, the robot must seek home 
avoiding fixed lights, but in this case the flashing light is hidden and the environment 
is not free from obstacles. To obtain a monolithic controller for this task is difficult 
and time consuming, and the network that implements it is necessarily quite complex. 

Figure 3 shows the three level architecture we evolved for this behavior, 
consisting of the monolithic homing behavior, a escape from collisions behavior, a 
two level wall following behavior, that shares the escape from collisions controller 
and a high level module that coordinates them. The 
operation of this controller is shown in figure 4. If the 
environment is relatively free from obstacles, the 
robot uses a escape from collisions behavior which 
allows it to explore its world relatively fast. When the 
environment is more complex, such as in die case 
presented in the figure, this type of behavior is not 
efficient, as the probability of the robot going into one 
of the boxes containing home is quite low. In this 
case, the robot makes use of either a wall following Fig. 4. Compound homing 
behavior or the lower level monolithic homing behavior. Home is in the right 

behavior allowing for a sort of bad wall following box and the trap in the left one. 

behavior that permits a faster exploration of the wall contours. 





Fig. 3. Three level compound 
homing architecture. 
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4 Conclusions 

In this article we have presented a structured way of obtaining compound behaviors in 
an infrasensorized robot with noisy sensors operating in real environments. This 
implies a hierarchical structure where the different behaviors are implemented 
through ANNs, some of them using temporal delays to consider time related events. 
Through this structure, the controllers may be used in multiple compound behaviors 
without duplicities, and, as new behaviors are evolved, this evolution becomes faster 
and simpler due to the fact that a lot of experience in the form of previously evolved 
behaviors is available. In order to prevent the problem of the designer having to be 
exhaustive in its determination of all the necessary lower level behaviors, we have 
included the possibility of cooperatively coevolving lower and higher level behaviors. 
To bridge the reality gap several types of noise were employed and, as some 
behaviors require temporal processing, a type of noise that implied variations in the 
temporal positions of the events the robot perceives in simulation was used. The 
controllers evolved in simulation directly transferred to the real robot. 
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Abstract. The design of behavior generating control structures for real 
robots acting autonomously in a real and changing environment is a com- 
plex task. This is in particular true with respect to the debugging process, 
the documentation of the encountered behavior, its quantitative analysis 
and the final evaluation. To successfully implement such a behavior, it is 
vital to couple the synthesis on a simulator and the experiment on a real 
robot with a thorough analysis. The available simulator tools in general 
only allow behavioral snapshots and do not provide the option of online 
interference. In order to cure these shortcomings, a visualization tool for 
aposfenor* graphical analysis of recorded data sets which gives access to 
all relevant internal states and parameters of the system is presented. 

The mini-robot Khepera hcts been chosen as experimentatory platform. 

1 Introduction 

The design of behavior generating control structures for real robots acting au- 
tonomously in a real and changing environment is a complex task. In [12] the use- 
fulness of embodiment for robotics is comprehensively pointed out as algorithms 
developed by sole simulation of autonomous agents in restricted and controlled 
environments may fail when transfered to a real system. Nevertheless, simula- 
tors as in [9] are very useful to obtain a primal executable version of a control 
structure generating the desired behavior. On the other hand, especially due to 
the fact that, in general, only momentary snapshots of the encountered behavior 
are available through these tools, a judgment of the behavioral dynamics is made 
very difficult. Moreover, the option of online interference concerning for example 
parameter variations is naturally minimal. For this, a completely new test run 
is required without the possibility to compare two instances directly. Coupling 
of synthesis and analysis in a feedback loop leads to an evolutionary process 
between implemented behavior and designer’s knowledge. Therefore, we propose 
to extend the common two .step program in behavior design (create a satisfying 
simulator solution, then adapt it to the real robot case) by a third stage, an 
aposteriori holistic analysis of the encountered behavior (Fig, 1). The latter re- 
quires the development of appropriate software tools as the one presented in this 
paper. Our basic approach for the implementation of behavior generating control 
structures for autonomous agents follows the broad outline of the so-called nou- 
velle Artificial Intelligence [2]. Complex behavior is produced by the interaction 
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Fig. 1. The proposed design cycle with a photography of the mini-robot Khepera [3]. 
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Fig. 2. A schematic of the input data flow of the visualization tool with a full-scale 
view of its graphical representation is shown together with the corresponding simulated 
environment (40cm x 35 cm, total runtime: 18 sec). The control panel (bottom right) 
of the visucJization tool permits to select a specific part of the data set in question, to 
start or stop the visualization at a particular point and to view the data stepwise. 

of ’simple’ modules active in parallel like the Braitenberg patterns [1], artificial 
neural networks, geometric calculations and rule-based reasoning without relying 
on extensive and resource inefficient internal representations of the environment. 
The basic modules are either hardwired or adaptive. In the first case, one may 
rely on genetic algorithms and evolutionary programming [4]. An alternative is 
to consider hand-designed algorithms, wherein the optimization process is left 
to the designer. This approach usually leads to the design of a more complex 
and easier-to-understand behavior. Secondly, one may endeavor the implementa- 
tion of adaptive, self-organizing structures. Here, either online learning [7], [13], 
which allows to adapt to an unknown environment, or off-line training [8], which 
- although limited in the case of a changing environment - in general produces 
more accurate results, are conceivable. Eventually, [5], [6] and [10] maybe seen as 
complementary to the presented approach. Firstly, evolutionary robotics instead 
of hand design are used to build the controllers. Secondly, the reality gap be- 
tween simulation and real robot behavior is bridged not by aposteriori analysis, 
but by an apriori set up of valid simulations. Common to all these approaches is 
the need for data documentation permitting a quantitative analysis, which has 
prior only been possible through simulations [11]. By means of the graphically 
represented data, an evaluation of the encountered behavior becomes possible. 
Hence, the designer is able to conduct an optimization process by appropriately 
varying the system and training parameters without directing a considerable 
number of test runs. The visualization tool (Fig. 2) described in the following 
section is conceived to actively support this process. 
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2 Building a dynamical view of the environment 

The main purpose of the presented software tool is to build a holistic view of 
the robot’s environment, which may be subject to dynamical changes. Note that 
the perceived distribution of obstacles is purely subjective, i.e. derived from 
the sensory-motor data of the robot. To stay abreast of environmental changes, 
special features (Fig. 2) permit to suppress obsolete parts of the encountered 
obstacle history. Every navigational tasks like transport or homing is based on a 
position calculating process. Since the mini-robot Khepera in its basic configura- 
tion does not dispose of far-ranging active sensors necessary to obtain topological 
information, we apply an explicitly geometrical method. The current position is 
obtained by odometrical path integration using the incremental encoder values 
nr, nR as variables and the wheel distance d as well as the advancement per 
pulse Al as system parameters. Since the designer knows the exact geometry of 
the real environment, one may compare the robot’s perception to reality. This 
is demonstrated by an experiment using an environment of rectangular geom- 
etry (75cm X 60 cm). The robot follows the walls for roughly two rounds (92 
sec, i.e. 1126 program cycles) and stops at an internal angle of 71.7® instead of 
90® (correct). The following results concerning angular errors due to parameter 
variations (original ones; d=52mm, Z\l=0.08mm) have been obtained by means 
of the visualization tool (Table 1,2): 
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Table 2. Angular error Aa by variation of d. 



Thus the presented tool may be used to optimize the odometrical parameters for 
individual real robots. Moreover, it gives the designer a hint of the time interval 
during which the position calculation system works sufficiently well. This may 
be seen through another experiment, where the same environment as before, but 
with an additional light source has been used. An event is defined by locating a 
light source. After the first detection, the robot was able to recognize the light 
source five times, then registered it as a different one (a confidence area of 20cm 
in Manhattan distance concerning the light source position was used). Hence, 
the positioning system worked well during 297 sec, i.e. 3825 update cycles. As 
already mentioned, the infrared proximity sensors are used to construct an esti- 
mation of the current environmental structures. Moreover, these data also form 
the sensor space on which different exploration modules like obstacle avoidance 
(OA), edge following (EF), turning (Turn) and point-to-point navigation (Nav) 
work in parallel, which are visualized in the control panel (Fig. 3). The spa- 
tial form of the sensor characteristics may be customized in order to simulate 
sensor degradation or the use of other kinds of proximity sensors. Furthermore, 
an artificial neural network has been applied for extracting a symbolic angle- 
to-light source information from the subsymbolic sensor data stream (Fig. 4). 
Eventually, a comparison between the simulator (Fig. 2) solution and the real ro- 
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Fig. 3. Visualization of the infrared sensor values (Sl-6, A[verage]S), the speed Vcilues 
determined by different softwcire modules £ind the values from the incremented encoders 
n (left). The possible positions of obstacles, which are causing this specific sensor input, 
are drawn around the robot (middle). A high sensor value corresponds to a nearby and 
imextended obstacle whereas a low one indicates a dislodged object. The average sensor 
noise is taken into account by the thickness of the traits. Finally, the sensor trait may 
be customized (right). 




Fig. 4. Visuahzation of the light sensor vedues (top middle) and the artificial neural 
network (bottom middle). It is possible to feed different thresholds (top middle/right) 
for analyzing the photo-sensitivity of the implemented algorithms. 




Fig. 5. A complete data set obtained from a real Khepera acting in a real environment 
is shown with a schematic of the corresponding actual obstacle structure (80 cm x 80 cm, 
toted runtime: 208 sec). 

bot’s implementation (Fig. 5) of the Dynamical Nightwatch’s Problem [8] reveals 
a shortcoming that is inherent to all simulations: the position determination is 
supposed to be perfect, which is necessary for correct representation. Note that 
in simulations, path integration is usually not used. In contrast, for real sys- 
tems, it still presents the main position calculating method. Hence, the most 
restricting limitation to the real robot’s performance arises from the erroneous 
position calculation due to wheel slippage and unknown fabrication tolerances. 
As demonstrated above, the presented visualization tool is particularly apt to 
assist in the minimization of errors due to the latter. 
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3 Conclusion 

The presented visualization tool for the mini-robot Khepera permits the exten- 
sion of the common two step design program for behavior generation by a third 
stage: an aposteriori analysis of the encountered behavior. This allows to set up 
a synthesis-analysis feedback loop evolutionarily improving the envisaged behav- 
ioral design. In particular, the holistic visualization of the encountered behavior 
enables the designer to thoroughly document, analyze, evaluate and compare the 
performance of his implementation. 
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Abstract. Nap-of-the-Earth (NOE) flight aind the development of ad- 
vEinced Unmanned Air Vehicles (UAV) require further integration of 
sensors and onboard processing. Flying insects feature a panoramic eye 
which relies on Optical Flow (OF) for guidance and obstacle avoidance. 
We describe the development of a miniature thrust-vectoring UAV fear 
turing an analog electronic eye based on biological motion detection. The 
system is tested within the laboratory on a custom-built whirling-arm rig. 
Preliminary tests have demonstrated satisfactory maneuvrability under 
remote-control. The project combines the fields of neurobiology, robotics, 
and cierospace. A VLSI implementation of the vision system is planned. 



1 Introduction 

Nap-of-the-Earth (NOE) flight is dangerous because obstacles are close and their 
perception challenging [1]. Flying insects exhibit NOE and obstacle avoidance 
abilities thanks to their wide Field of View (FOV) compound eyes which seem 
tailored for optical flow-based flight guidance and control [2]. Furthermore, flies 
bear remarkable neural fusing of visual, inertial, and aerodynamic senses [3]. The 
goal of our visuomotor control testbed is to demonstrate how insect vision can 
be applied to Unmanned Air Vehicles (UAV). This work continues a preliminary 
study of reactive speed emd altitude control using Optical Flow (OF) [4]. 

Section 2 summarizes our effort to achieve a design scaled to eventually carry 
the entire sensory-motor system. Section 3 describes how we apply Optical Flow 
and its impact on sensor design. We explain how we use Optical Flow with a 
flight simulation in section 4 and then present preliminary tests with our indoors 
flight testbed in section 5. We conclude with further miniaturization in mind. 

2 UAV Design 

The main constraint on the UAV’s design stems from our requirement to perform 
low speed flight tests within the limited space of our laboratory. Furthermore, 
our need to demonstrate reactive maneuvers in order to achieve terrain following 
and obstacle avoidance as well as hovering, implies small size and weight. The 
aircraft is mounted on a whirling-arm testbed. 
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We produced a single-rotor vectored-thrust Vertical Take-Off and Landing 
(VTOL) design. The thrust vectoring concept uses vanes to redirect the flow and 
permits outstanding maneuvr ability [5]. Recent UAV projects also explore this 
technology [6]. Our design results from the following considerations: 

- Hovering ability is required through single motor control, 

- Compact with good thrust /weight, mechanical and aerodynamic simplicity, 

- Best shape for 360 deg FOV eye and embodiment, 

- Axisymmetric conflgurations simphfy inertial calculations. 

Aerodynamic calculations [7] for the rotor and thrust-vectoring vanes led to the 
construction of a 3-degrees of freedom Proof of Concept (POC) featuring: 

- A 35cm diameter remote-controlled variable-pitch rotor, 

- A controlled 300W motor with gear spinning the rotor at 6000 RPM, 

- A single vane immersed in propeller flow with remote-controlled elevon, 

- An opto-electronic rotor tachymeter, 

The assembly weighs 0.85kg. Balanced control of the three actuators (rotor 
speed, blade pitch, vane elevon) combined with visual and inertial sensing en- 
ables us to vary thrust and aircraft pitch to control attitude, altitude, and speed. 



3 Eye Design 

We favored a camerular assembly (compactness, simplicity) to a compound eye 
design. It embeds a one-dimensional 20-pixel linear photoreceptor array and 
disposable camera lens. 

An essential component of biological visual motion processing, whether it 
takes place in insects or humans, is the Elementary Motion Detector (EMD). 
Each EMD detects motion occurring in a particular direction within a small 
part of the visual field. The airborne photoreceptor array connects to an array 
of analog electronic EMDs derived from those of the fly. These EMDs were 
developed for an earlier mobile robot [8]. The retina is tilted so that its FOV 
covers the forward and downward region (figure 1). 



airborne photoreceptor array 




photoreceptor array moving at altitude 
h and horizontal speed V above ground. 



Amplified photoreceptor voltage (V) 




Photoreceptor distance 
from lens (mm) 



1.6 2.0 

Projected retinal postion (mm) 
Fig. 2. ASF of a single photoreceptor 
(width 0.8mm) for 11 defocusing dis- 
tances (13 to 30mm). 
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We tuned the receptor’s Angular Sensitivity Function (ASF) to the sampling 
period (the interreceptor angle) A<p for improved motion detection and anti- 
aliasing. We determined the ASF of the “lens-photoreceptor” system by trans- 
lating a light source across the FOV while measuring the electrical response of 
a single photoreceptor for various defocus distances (Figure 2). 



4 Nap-of-the-Earth Flight Simulation 



The control system relies on flight regimes interpolated from attitude and thrust 
flight data. The aircraft initially climbs to horizontal flight with no visual feed- 
back [9]. Similarly to insects, some maneuvers are performed in a preprogrammed 
open-loop fashion. Our simulation assumes a steady thrust and attitude can be 
maintained through tachymeter and inertial sensors feedback. 

When flying horizontally at altitude h and constant velocity V over the hor- 
izontal ground [10], the retinal velocity for a linear array is: 



^ret 



( 1 ) 



fV sin^(Qi.et + 7) 
h cos^(7) 

where / is the distance between optical center and retina, aj-Qi is the an- 
gle between the optical axis and V, 7 is the angle between optical axis and 
photoreceptive pixel axis. 

retinal 
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Fig, 3. Polar plot of (1), retinal speed 
(mm/s) vs ray angle (deg) with respect 
to optical axis (%et ~ deg, / = 13 
mm, F = 2 m/s, ft = 5 m). 
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Fig. 4. Whirling-arm testbed. 



Equation (1) shows (Fig. 3) that the forward part of the FOV generates nearly 
no Optical Flow [11] and responds poorly to ground height variations. Assuming 
the aircraft starts flying at a preprogrammed ground-speed over flat terrain, 
we set a reference OF distribution with (1). Terreun following is then achieved 
by retaining the reference OF through altitude variations. We use predefined 
thrust and pitch combinations at constant horizontal speed without introducing 
spurious rotation-related OF. 

Since the forward view is the most important for obstacle avoidance we de- 
vised a weighted average OF fusion scheme giving most importance to the for- 
ward FOV and less to the downward FOV. This paradigm was inspired by the 
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response fields and dendritic structures of frontal neurons VSl and VS2 of the 
blowfly [12]. A reference Optical Flow corresponding to a predefined reference 
speed Fref and altitude is fused into a reference weighted average : 

( 2 ) 

where i = 1..N ranges firom front to most downward viewing photoreceptor 
axes. We then relate a similar in-flight OF weighted average OF.v. with OFr.f av«: 
OFiatio “ [OFref ave OF.v.)/OFr«w. OF variation is inversely proportional to 
the square root of al titude so we produce a request for a new altitude /ireq = 
A-hraf sign(OF ratio )Vm ratio I . Finally, the average divergence over a few 

forward-looking EMDs triggers evasive maneuvers for large vertical obstacles. 




Fig. 5. NOE simulation with 20-photoreceptor retina, optical axis -40 deg, FOV 75 deg 
(within the lens limits). This run did not trigger the divergence-based evasive routine. 



A simple NOE flight (figure 5) programmed with Scilab depicts the path of 
the aircraft eye’s optical center. Flies fly closer to the ground when faced with 
headwind and we simulated this by progressively decreasing the aircraft speed 
but retaining the same request to maintain a reference OF (not shown). 

5 Experimental Rig 

Our indoors flight-testing requirement led to the construction of a pantographic 
whirling-arm rig (figure 4). It prevents yaw and roll of the UAV but permits 
variation in pitch (±30 deg) and altitude (0.3 to 3m). The model is powered by 
two car batteries (24V, 15 A) fed to the aircraft through heavy duty bearings. 
Visual, inertial, and tachymeter signals are transmitted from the UAV to the 
rig’s base through a slip-ring. The 20 photoreceptor signals are preamplified 
onboard and fed to 19 EMD boards scanned by an acquisition card in a PC 
running Real-Time Linux. Flight commands are output via the parallel port to a 
standard radio-control model transmitter. The onboard radio receiver commands 
the rotor speed variator, the collective pitch, and eleven servos. We can recover 
the aircraft manually in case of software malfunction. 

Manually remote-controlled flight tests were performed in the laboratory to 
assess both rig and aircraft reliability. Horizontal speeds reached 6 m/s. High 
speeds tend to force the whirling-arm to remain horizontal; maneuvers are less re- 
alistic due to inertial forces. Vertical thrust pulses (> 2m/s) were demonstrated 
by increasing rotor blade pitch or RPM. Rotor blade pitch (or collective pitch in 
helicopter terminology) is beneficial for precise thrust adjustments, especially in 
groimd effect. Maneuvrability was impressive. Aggressive flight direction rever- 
sals were achieved in 2-3 seconds at high flight speeds using eleven commands. 
Stability is induced by the large rotor. Steady hovering and forward flight tran- 
sitions were demonstrated in and out of ground effect. 
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6 Conclusion 

This UAV project combines the fields of neurobiology, robotics, and aerospace to 
test biologically-inspired sensory-motor concepts in a challenging environment. 
We use a low number of Elementary Motion Detectors to achieve reactive flight 
control. The Optical Flow field is computed and fused with inertial inputs by 
the real-time flight control computer. Although the testbed constrains tests to 
3 degrees of freedom, flight tests have shown that it will enable us to explore 
aspects of insect flight, especially in wind gusts. 

Our UAV design is suitable for the scale we have chosen but might not 
be easily scaled down or up because of aerodynamic or structural limitations, 
respectively. Our objective is to integrate the visuomotor paradigm into a VLSI 
neuromorphic circuit which could then be embedded into Micro Air Vehicles 
(MAV) to assist guidance in cluttered surroundings and indoors operations. 
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Abstract A framework for understanding and exploiting embodiment is 
presented which is not dependent on any specific ontological context. This 
framework is founded on a new definition of embodiment, based on the 
relational dynamics that exist between biological organisms and their 
environments, and inspired by the structural dynamics of the bacterium 
Escherichia coli. Full recognition is given to the role played by physically 
instantiated bodies, but in such a way that this can be meaningfully abstracted 
within the constraints implied by the term ‘embodiment’, and applied in a 
variety of operational contexts. This is illustrated by ongoing experimental 
work in which the relational dynamics that exist between E. coli and its 
environment are applied in a variety of software environments, using Cellular 
Automata (CA) with artificial ‘sensory’ and ‘effector’ surfaces, producing 
qualitatively similar ‘chemotactic’ behaviours in a variety of operational 
domains. 



1 Introduction 

This paper is concerned with the nature of embodiment — it proposes a new and 
precise definition of the term derived from asking ‘what is it that is special about the 
relationship between bodies and the world?’ and then suggesting how the features that 
are identified can be put to use independently of any specific ontological context. By 
focusing on the relationship between a system and its environment as the basis for 
embodiment, it is possible to analyse the significance of physical qualities without 
grounding the analysis itself in a material ontology. Material features are significant in 
so far as they condition the system-environment relationship, but not just because they 
are material. 

The definition of embodiment presented yields practical and conceptual benefits. It 
provides a basis for quantifying embodiment, which is significant for behavioural 
robotics, for example with regard to understanding how to calculate and maximise 
embodiment, as well as understanding the problems that arise in moving between 
simulation and actual physical environments (cf. [1]). 

The ontological neutrality of the definition also enables inter-disciplinary 
discussion about embodiment, for example between the behavioural robotics and 
intelligent software agents communities. It does this by providing a common 
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framework for addressing embodiment — regardless of context — whilst recognising 
the uniqueness of different forms of embodiment. 

On the same basis, it can defuse the tension between Artificial Life (ALife) and 
embodiment (cf. [2, 3]). From an ALife perspective, embodiment represents a 
theoretically well-grounded alternative to the tradition of symbol manipulation in AI. 
However, if physical embodiment is a necessary condition for the emergence of at 
least some life-like behaviour, this bodes Ul for the synthesis of sucl^ behaviours in 
non-physical media — a central theme in ALife [4], 



2 Embodiment and System - Environment Dynamics 

Grounding embodiment in system-environment dynamics fits well with existing 
bodies of research. Ray’s work on Tierra illustrates the significance of dynamics in 
generating the phenomena associated with living systems [5]. 

Striking examples of analyses of ‘real world’ embodiment that appeal to dynamical 
relationships between systems and the environments in which they are observed can 
be found in, for example, [6, 7], and particularly in Beer’s work [8, 9]. 

Kushmerick [10] illustrates some of the difficulties of adherence to domain 
specificity in embodiment, which become apparent when trying to apply lessons 
leamt from the material world to that of the software agent. Without an underlying 
definition of embodiment, transferral of lessons leamt in the material world can only 
occur at the level of manifest phenomena, rather than at that of underlying or 
generative causes. 

Franklin comes closest to the spirit of the perspective advocated here, referring to 
embodiment in terms of “autonomous agents stmcturally coupled with their 
environment” [11]. Etzioni [12] adopts a similar stance. 



3 Embodiment as Situated Structural Coupling between System 
and Environment 

We define what it is for a system to be embodied as follows: 

A system X is embodied in an environment E if perturbatory channels exist 
between the two. That is, X is embodied in E if for every time t at which both X 
and E exist, some subset ofE’s possible states have the capacity to perturb X’s 
state, and some subset of X’s possible states have the capacity to perturb E’s 
state. 

This relational definition draws on Maturana and Varela’s influential notion of 
structural coupling [13, 14]. At once, embodiment becomes quantifiable, ontology 
independent and directly linked to behaviour. 

Degrees of Embodiment. The definition above describes the conditions under which 
stractural coupling is possible. The mere fact that an object is embodied in an 
environment is insufficient to guarantee that any interesting interaction will occur 
between the two. It simply affords the possibility of perturbatory interaction. 

There is huge scope for variation in this common embodiment relationship aeross 
different instances of embodiment, be it as a result of design or natural development. 
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As the perturbatory relationship between system and environment is quantitatively 
measurable, embodiment itself becomes measurable — expanding our viewpoint 
beyond the issue of whether or not a particular system is embodied*. One possibility is 
to ground a metric in the total complexity (as rigorously defined in [16], or 
alternatively [17]) of the dynamical relationship between system and environment, 
over all possible interactions. Factors such as the total bandwidth of the perturbatory 
channels between system and environment, as well as the computational power in the 
dynamics of their interaction, may contribute to this complexity. 

Environmentally Coupled Cellular Automata as a Generic Class of Embodiable 
Dynamical System. Cellular Automata (CA) designed to engage in a mutually 
perturbatory relationship with some envirormient are suitable for exploring and 
exploiting the embodiment relationship articulated above, in that they can participate 
in the interaction defining such a relationship. Precedent for such a form of CA, in 
contrast to the more common closed form (cf. [18, 19]) has been set by Varela’s 
Bittorio [20] (see also [21] for a demonstration of CA-environment coupling in an 
evolutionary context). 



4 Bots and Bacteria — E. coli on the Internet 

This section provides context for the definition of embodiment outlined above, by 
illustrating how E. colHs autonomous and adaptive chemotactic behaviour emerges, 
for an observer, from the embodiment relationship between bacterium and 
environment. An experimental programme is underway, designed to explore this 
definition of embodiment by instantiating the equivalent embodiment relationship 
between a CA-based software system and a variety of ontologically distinct 
operational environments. 

Structural Dynamics in E. coli. Despite being far less structurally complex than 
some multi-cellular organisms, and equipped with only non-directionally sensitive 
receptors and effectively binary state effectors, E. coli exhibits adaptive and 
consistently sensitive (cf. ‘dynamic range,’ below) chemotactic behaviour in response 
to nutrient gradients over five orders of magnitude [22]. 

The dynamics of two structural processes play key roles in the emergence of E. 
colBs chemotaxis — highly connected signalling pathways within the cell and spatial 
clustering of receptors on the surface of the bacterium [23] — operating within and in 
relation to its physical envirormient via sensory and effector surfaces (nutrient 
receptors and flagella, respectively). 

The signalling pathway is a CA-like system, comprised of a number of 
interconnected elements with relatively simple interaction rules between them. 
Interactions are based on the transfer of phosphoryl groups, the presence of which at 
flagella motor sites promotes ‘tumbling’ (random reorientation). Internal processes 
produce phosphoryl groups, encouraging frequent tumbling. Encounters with 
chemoattractants inhibit this process, shifting the behavioural bias towards ‘running’ 
(smooth swimming). In addition, receptors are constantly methylated, which promotes 
tumbling, even at higher concentrations of chemoattractants (cf. [22] for more detail). 



* As suggested in [15] with respect to robots. 
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Receptor clustering plays a pivotal role in E. coli’s dynamic range [23]. This 
occurs at low attractant concentrations, and has the effect of pooling the output of a 
group of receptors when one member is activated. At higher concentrations the 
receptors disperse, providing sensitivity to attractant binding that would be effectively 
ignored by large clusters. 

E. coli on the Internet A Java program, ‘Phenomorph’, is under development. Its 
relationship to Web pages is based on E. coli’s relationship to its environment. The 
purpose is not to achieve optimal information search (cf. [24] [25]), but to investigate, 
in conjunction with planned future experiments^, the validity of the concept of 
embodiment presented above. 

At the heart of Phenomorph lies a uniform ID binary state environmentally 
coupled CA which generates dynamics roughly analogous to those inherent in E. 
coli’s stmcture. Receptor clustering and methylation are also simulated, whilst 
keywords defined by the user at run-time play the part of chemoattractants. When 
Phenomorph visits a web page containing defined keywords, the CA is 
proportionately stimulated. The CA global activation level determines the likelihood 
that Phenomorph will ‘run’ rather than ‘tumble,’ each of which are implemented 
through hyperlink following. See [26] for a more detailed description of Phenomorph. 
Initial Evaluation. Although not yet developed sufficiently to produce infotaxis^, and 
lacking comparative studies in other operational enviromnents, Phenomorph shows a 
variety of autonomously generated responses to environmental features, which arise 
directly from and are determined solely by the interplay between the environment and 
its CA-based structural dynamics. Environmental variety is constantly filtered by the 
(very simple) form of Phenomorph’s embodiment. 



5 Conclusions 

The definition of embodiment presented offers immediate opportunities to bridge the 
interpretative gap between disciplines concerned with very different forms of 
embodiment, something previously hampered not least by the lack of any firm 
definition of the term. This understanding of embodiment has the potential to provide 
benefits to practitioners on both sides of the ontological divide between physical and 
non-physical systems and environments — from measuring the embodiment of robots 
to evolving distributed autonomous control systems that exploit emergent behavioural 
strategies across a range of operational environments. A great deal of experimental 
work also remains to be done investigating this concept of embodiment across 
operational environments, and developing possibilities for its exploitation. 



^ These will consist of embodying Phenomorph, via appropriate sensory and effector surfaces, 
in an abstract parameter space, and a physical environment. 

An informatic counterpart of chemotaxis. Phenomorph has yet to attain a level of behavioural 
‘fitness’ that can be compared with that of the acutely honed E. coli. 
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Telepresence is the future of multimedia systems and will allow participants to 
share professional and private experiences, meetings, games, parties. The con- 
cepts of Distributed Virtual Environments are a key technology to implement 
this telepresence. Using Virtual Humans within the shared environment is a 
essential supporting tool for presence. Real-time realistic 3D avatars will be es- 
sential in the future, but we will need interactive perceptive actors to populate 
the Virtual Worlds. The ultimate objectic'e in creating realistic and believable 
virtual actors is to build intelligent autonomous virtual humans with adaptation, 
perception and memory. These actors should be able to act freely and emotion- 
ally. Ideally, they should be conscious and unpredictable. But, how far are we 
from such a ideal situation? Our interactive perceptive actors are able to perceive 
the virtual world, the people living in this world and in the real world. They 
may act based on their perception in an autonomous manner. Their intelligence 
is constrained and limited to the results obtained in the development of new 
methods of Artificial Intelligence. However, the representation under the form of 
virtual actors is a way of visually evaluating the progress. In the future, we may 
expect to meet intelligent actors able to learn or understand a few situations. 
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Abstract. This paper proposes a paradigm for specification of virtual human 
agents’ level of autonomy. The idea we present in this paper aims at optimi s- 
ing the required complexity of agents in order to perform realistic simulations. 
The paradigm is based on the distribution of the required virtual human agent 
“intelligence” to other simulation entities like groups of agents, and objects. 



1 Introduction 

Several methods have been introduced to model learning processes, perceptions, 
actions, behaviours, etc, in order to build more intelligent and autonomous virtual 
agents. Our goal in this paper is to propose a paradigm to define virtual agents en- 
dowed with different degrees of behavioural autonomy. 

First of all, we present some useful concepts assumed in this work. A virtual hu- 
man agent (here after just referred to as an agent) is a humanoid whose behaviours 
are inspired by those of humans’ [15]. The term group will be used to refer to a 
group of agents, and the term object for an interactive object of the environment. 
Agents, groups, and objects constitute the entities of the simulation. A high-level 
behavioural autonomy concerns the ability to simulate complex behaviours. In this 
paper, we consider that the ability of agents for autonomously acting can be included 
in the agents (agents-based application), groups (groups-based application) or in the 
objects (objects-based application). Among others, interactivity, complex behaviours, 
intelligent abilities and frame rate of execution are directly related to the level of 
autonomy (LOA). Table 1 presents this relation using three kinds of behavioural 
control: guided represents the lower level of autonomy where the behaviours have to 
be provided by an external process. Yet, in Table 1, programmed control implies to 
use a notation (language) to define possible behaviours. The autonomous behaviour 
concerns the capability of acting independently exhibiting control over their internal 
state [28]. 



Table 1 . Characteristics of different levels of autonomy (LOA). 



BEHAVIOUR CCMTROL 


GUIDED 


PROGRAMMED 


AUTONOMOUS 


Level of Autonomy 


Low 


Medium 


High 


Level of Intelligence 


Low 


Medium 


High 


Execution frame-rate 


Low 


Medium 


High 


Complexity of behaviours 


Low 


Variable 


High 


Level of Interaction 





Variable 


Variable 
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2 Related Work 

Several works have discussed the various manners to simulate and interact with 
virtual agents. Zeltzer [29] presents a classification of levels of interaction and ab- 
straction required in different applications. Thalmann [4] proposes a new classifica- 
tion of synthetic actors according to the method of controlling motion. Reynolds [22] 
presented the aggregated motions modelling. In recent work, a crowd model has 
been Introduced using different abstractions of behaviours, like the term guided 
crowd [17]. Considering agent-object interaction tasks, some semantic information 
has been included within the object description. In particular, the object specific 
reasoning [11] creates a relational table to inform object purpose, and smart objects 
were introduced [10] containing interaction information. 



3 LOA Related to Individuals 



Several works agree with the concept of autonomous or intelligent” agent require- 
ments: autonomous behaviour, action, perception, memory, reasoning, learning, self- 
controlled, etc [15], [18], [22]. Yet, a lot of methods have been developed in order to 
model autonomous agents: L-systems [18], vision systems [21]; rule-based systems 
[22]; learning methods [25], etc. Yet, guided or programmed agents can also be 
useful depending on the application. Table 2 exemplifies the three kinds of agent 
autonomy using two different agent tasks. 



Table 2. LOA present in different aRcnt-oriented tasks. 


LOATTASK 


AGENT GOES TO A SPECIFIC LOCATION 


AGENT APPLIES A SPECIFIC /CTION 


Guided 

Programmed 

Autonomous 


Agent needs to receive during the simulation a list of 
collision-free positions 

Agent Is programmed to follow a poth while avoiding 
coiltsion with other agents and programmed obstacles 
Agent is able to perceive information in the environment 
and decide a path to follow to reach the goal, using the 
environment perception or the memory (past experiences) 


Agent needs to receive information about the 
action to be applied 

Agent is programmed to manage where and how 
the action can occur 

Agent can decide about an action to be applied. 
This action can be programmed, Imitated or 
existent in the memory (past experiences) 



4 LOA Related to Groups of Agents 

In the case of crowd simulation, usually we intend to have lots of virtual human 
agents avoiding dealing with individual behaviours. Contrary to the last section, our 
goal here is to describe methods to provide intelligence focused in a common group 
entity that controls its individuals. We have called groups-based application, the 
crowd and group applications, where individual complexity is less required. In this 
case, the intelligence abstraction can be included in the groups providing more 
autonomy to the groups instead to the individuals. 

Considering levels of autonomy (LOA), we have classified the crowd behaviours 
in three kinds: i) Guided crowds, which behaviours are defined explicitly by the 
users; ii) Programmed crowds, which behaviours are programmed in a script Ian- 
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guage; iii) Autonomous crowds, which behaviours are specified using rules or others 
complex methods. Table 5 exemplifies this classification of crowd autonomy using 
two different crowd tasks. 





Table 3: LOA present in difTcrent | 


group-oriented tasks 


LOA/IASK 


Group goes to a speciftc tocalion 


Group reacts to matched event 


Guided 

I’rogrammcd 

AutotX)nious 


Group needs to receive during the simulation a list of po- 
sitions “in-betweens’* in order to reach the goal 
Group is programmed to follow a path avoiding collision 
with other agents and programmed obstacles. 

Group is able to perceive information in the environment 
and decide a path to follow to reach the goal, using the 
environment perception or the memorv (oast cxoeriencesV 


Group needs to receive an information about the 
matched event and the reaction to be appl icd 
Group can manage events and reactions, which 
arc programmed. 

Group can perceive a matched event and decide 
about the reaction to be applied. This reaction can be 
also proRrammed or existent in the rtoup memorv. 



5 LOA Related to Objects 



Whenever the simulation needs to handle complex agent-object interactions, many 
difficult issues arise. Such difficulties are related to the fact that each object has its 
own movements, functionality and purposes. One can consider that agents’ perce p- 
tions can solve some simple tasks, as for instance a single-hand automatic grasping 
of small objects. But this is no more possible for interactions with objects that have 
an intricate proper functionality, as the lift example. In fact, each time more infor- 
mation related to the object is given, its level of autonomy (LOA) is increased. Table 
6 illustrates how an agent must proceed according to the different LOAs for three 
different interactive objects of the environment. 



Table 4; LOA present in diflercnt objects-oriented tasks 



tOA/OBJECf 


Door 


Sioi 


Lin 


Guided 


The agent have to move its amt to a 
attainable and meaning location of the 
door, and control its movement until 
t^n It. 


The agent recognises that the 
sign has an am>w and 
recogttios the showed 
dlitcllon. 


The agent recognises ^^he^e is the call 
button, how and when the door opens, how 
and where to enter inside the lift, when and 
how to go out, etc. 


Hfo^^med 


'Ihc agent tes to move its arm to the 
right place but the door opens by itself. 


The agent recognises the 
sign, but dre direction is 
given no recogntion. 


The accesses the current lift state ami 

decides only its moves accordingly. 


Autonomous 


The door takes control of the agent 
telling exi^tly the place to put its hand 
and the complete movemeru of the 
door 


The sign gives a new 
difcclkm to go for each agent 
that passes rtearby. 


The lift takes control ofthe movements of 
the agent and gives him a omiplettf plan, 
based on primitive actions, to perform the 
interaction. 



6 The Proposed Paradigm 

As presented in the last sections we considers that the ‘Intelligence” is not only 
included in the virtual human agents, but can be also included in groups and objects. 
Considering the abstraction levels; guided, programmed and autonomous behaviours, 
we present a schema that includes the entities group and object, as showed in Figure 
1 . We can so classify a simulation in terms of the autonomy distribution among its 
entities, i.e., a simulation ( SO, can be translated as a function of three components: 
agents, groups and objects; S,- = f f LOKAeents). LOACGroups). LOAfObiects) ) 
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Level of 
Autonomy 



Intellisent Entity 

i L Agents 

j 1 1 Groups 

Autonomous Programmed Guided 

'^Objects 



Figure 1: Level of autonomy vs. intelligent entity. 



7 A Case Study 

The chosen case study was designed to deal with different kinds of control. We 
consider the environment of a virtual city [6] containing some streets, a supermar- 
ket (S), a train station (TS), autonomous objects (direction signs to go to the TS) 
and other buildings. Let G be a group of virtual agents endowed with different 
LOAs, which can be guided, programmed or autonomous. The goal of group G is to 
go from the supermarket to the train station. We dealt with four kinds of simula- 
tions having different group controls, interacting or not with objects. 



Facts: 

- Goal tG) = Go From S to TS 




Simulation 1 


- G = Guided Group 


- 0 * Programmed group 


Autonomous objects exist but G can not recognise them. 


•• Autonomous objects exist but 0 can not recognise them, be* 


- G does not know its goal 


cause 0 is not programmed for that 


~ Initially, G receives a location to reach 


0 knows the programmed goal 


*• G is able to walk to reach this location 


** Initialiy, 0 is able to translate a programmed goal (TS) in a 


- G is not able to avoid collision with obstacles 


path to be applied 


5iimilalim2 


Simulation 4 


- G ® Autonomous group. 


~G « Autonomous group 


Autonomous objects exist and G can recognise 


Autonomous objects do not exist 


G knows its goal 


‘*'0 knows its goal 


Initially, G is able to recognise the autonomous objects, go to 


0 is provided with vision and environment knowledge 


a location near to it and interact with. The autonomous object is 


~ G can fmd a path to reach the goal by perceiving the environ- 


able to recognise where G wants to go and to give the correct 


ment (signs) by its own 


direction 


~ Q is able to perceive and avoid collision with obstacles 


*-G is able to follow obiect instructions 






Figure 2: (left): the starting and goal points; (center) group G going to interact with autono- 
mous object; (right); comparison data between the four simulations. 

In Figure 2 (right), some parameters (except execution time) represent subjective 
data to be measured, then we decide to intuitively quantify it in four levels: 25 
(Low), 50 (inferior medium), 75 (superior medium) and 100 (high). 
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8 Conclusions 

We propose in this paper a paradigm to distribute the autonomy among the entities 
of the simulation. The idea we dealt here concerns the possibility to improve the 
frame rate of execution as well as to optimise the complexity required, by distribut- 
ing some knowledge and autonomy to others entities of the simulation: groups and 
objects. This paradigm has been tested in the context of a Virtual City project [6] 
because we have to simulate several virtual human agents that can act in differently 
ways and apply different actions. 
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Abstract. It has been recently suggested by a number of authors that 
modelling of emotions and related motivational systems in agents might 
have great practical value, apeurt from the interest of providing possible 
explanations for the emotional mechanisms of human agents. Emotions, 
or needs, may be used as signcJling mechanisms between different sub- 
systems (subagents) inside an agent, cis well as between different agents. 
In this paper, we investigate some problems that may arise with emo- 
tional agents. Since needs and emotions are largely global, stable reac- 
tion tendencies, they may exhibit rigidities that lead to different forms of 
maladaptive behavior, i.e. behavior that is not well suited to the present 
environment of the agent. We investigate emotional learning in agents 
by an utterly simplified decision-theoretical model. We show that even 
in this very simple model agents may develop maladaptive patterns of 
behavior that closely resemble patterns found in emotional disorders in 
humans. The maladaptive behavior patterns are due to non-optimal val- 
ues for the two decision parauneters, which are functions of the prior 
beliefs of the agent. 



1 Introduction 

A central issue in artificial life, as well as in artificial intelligence, is the study 
of adaptive agents. Adaptive agents learn from their environment, possibly by 
constructing a model of the world, so as to maximize some desired criteria. The 
agents’ model of the world, and the ensuing behavior tendencies are not rigid, 
but a product of experience and interaction with the environment. This implies 
that agents have different models due to differential learning by the agents. Each 
has its own history which has led to a different model of the agent world. 

It has been recently suggested by a number of authors, e.g. [2,6], that mod- 
elling of emotions and related motivational systems in agents might have great 
practical value, apart from the interest of providing possible explanations for 
the emotional mechanisms of human agents. For example, emotions, or needs, 
may be used as signalling mechanisms between different subsystems (subagents) 
inside an agent, as well as between different agents [5, 1, 4, 6]. A seminal proposal 
in this line of research was Simon’s interruption theory [5], where emotions in- 
terrupt ongoing goal processing in order to direct processing resources to more 
urgent goals. 
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In this paper, we investigate problems that may arise with emotional agents. 
Since needs and emotions are largely global, stable reaction tendencies, they 
may exhibit rigidities that lead to different forms of maladaptive behavior, i.e. 
behavior that is not well suited to the present environment of the agent. In 
humans, such maladaptive emotional states, and the ensuing maladaptive be- 
havior patterns, have been a subject of a large body of research in psychology 
and psychiatry. Modelling emotional learning in agents, we show how agents may 
develop maladaptive patterns of behavior that closely resemble patterns found 
in emotional disorders in humans. 

We show that maladaptive emotion-based behavior can emerge from a very 
simple model of the agents’ decision-making system. The basic emotional sys- 
tem in our model can be reduced to two parameters that can be learned e.g. 
by reinforcement learning. Non-optimal values for these parameters may lead 
to maladaptive behavior patterns that resemble such disorders as depression, 
anxiety, and mania. A cognitive interpretation of the parameters in terms of 
(Bayesian) prior beliefs shows how emotional disorders are related to erroneous 
beliefs on the world. The results are relevant to the design of emotional agents, 
and may give insight into the processes involved in human maladaptive behavior. 



2 Minimal Agent model 

To begin with, we describe an very simple agent model that we use to illustrate 
maladaptive emotion-based behaviors. We accomplish this with an agent that 
has just one long-term goal, three internal needs (emotions), two types of sensory 
perceptions, and three action alternatives. 

Goal and needs. In our highly simplified agent model, the agent has (or seems 
to have) just one ultimate long-term goal. This is the combination of actions, 
states or objects that the agent needs to perform or obtain to be useful for its 
designer (in the case of a robot) or to spread its genes (for biological agents). 
Typically, the satisfaction of the goal is not a binary truth value, but rather a 
scalar quantity S that measures how well the agent performed during its lifetime. 

Though the agent has only one (ultimate) goal, we consider that its decision- 
making system consists of several modules that correspond to different needs. 
(We ignore here the modules needed for perception, action etc.) The needs are 
closely connected to emotions because it may be considered that emotions are 
expressions of current needs. Thus we model emotions in our system indirectly, 
via the underlying needs. Biological systems show that it is useful for an agent to 
construct the decision-making device of components, each corresponding to one 
type of constraint of the environment, or one type of subgoal. A need corresponds 
thus in our terminology to one of those subgoals. An emotion can be considered 
as a signal that expresses the urgence of a given need. 

Firstly, to generate behavior that is useful for directly satisfying the (long- 
term) goal, the agent may be considered to generate a set of internal states that 
we may call the goal need. 
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If the agent just had its goal need, its behavior would consist of trying to do 
anything that lets it satisfy its goal need at the present instant. Such a short- 
sighted behavior would, however, lead to rather small goal satisfactions in a 
hostile environment. Thus we include in our agent model a second need which 
consists of trying to avoid any external events that might destroy the agent. We 
call this need the safety need. 

Another need that in often in conflict with the goal need is the energy need. 
This means the need to get energy when energy level is low, by means of drinking, 
eating, charging batteries, etc. For simplicity, we assume in our model that the 
agent gains energy automatically with time. 

Perception of environment. When some need-relevant event occurs, we as- 
sume the perceptual system computes two conditional probabilities; First, 
P{event\goal) , the conditional probability that an event that satisfies the goal 
need would produce such a perception, and second, P{event\danger), the condi- 
tional probability that an event that threatens the safety need, i.e. is dangerous, 
would produce such a perception. Such probabilities are typically produced by 
most pattern-recognition methods, e.g. by computing distances from a template. 
It is here assumed for simplicity that all events belong to two uniform categories: 
those satisfying the goal need and those threatening the safety need. 

Actions. On the basis of the perceptual quantities, the agent must decide its 
course of action among the following alternatives. 1) Approach: This means that 
the event or object is explored further, and the possible increase in goal satis- 
faction is enjoyed. 2) Avoidance: This means that the agent tries to avoid the 
event or object. 3) No action. It is assumed that both actions consume the same 
amount of energy, whereas the no action alternative does not consume any. Since 
the energy supplies are constantly replenished, this means that not doing any- 
thing really means resting to get energy. 

Competition of needs. Choosing among the action alternatives is here done 
using a simple competition of need signals. (A more sophisticated justification of 
this procedure is given below.) The need modules have certain weight parameters 
Wgoai and Wdanger that are based on information they have about the context 
and the possible urgencies of the needs. On the basis of the perceptions and these 
parameters, the need modules transmit their priorities or need signals in the form 
of quantities WgoaiP{event\goal) and WdangerP{^vent\danger). In other words, 
they multiply the perceptual probabilities by some quantities that express the 
importance of the need. Similarly, the energy need transmits a signal, which does 
not depend on perceptions, but possibly on some internal state of the system. 
We can normalize the signals so that the signal given by the energy need is equal 
to 1 always. Thus the decision between different action alternatives is made by 
choosing the maximum among the quantities 

needsignal(goal) = WgoaiP{Gvent\goal) (1) 

needsignal{danger) = WdangerP{event\danger) (2) 

needsignal {energy) = 1. (3) 
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Utility-theoretical interpretation. The decision rule given above can be in- 
terpreted in the framework of utility theory. Due to lack of space, we present here 
only the results of the analysis, details can be found in [3]. To decide between the 
three alternatives given above, the agent uses simple Bayesian decision proce- 
dures. We assume that the agent has some estimates of the following quantities. 
First, AS {goal): the increase in (life-time) goal satisfaction due to encounter 
with a goal satisfying event. Second, AS (danger): the expected decrease in (life- 
time) goal satisfaction due to destruction caused by encounter with dangerous 
event. Moreover, the agent has estimated from the environment the following 
prior probabilities, i.e. probabilities that express the prior belief on the occu- 
rance of the two kinds of events that the agent has before observing the present 
environment : P (danger): the prior probability that a dangerous event occurs 
and P(goal): the prior probability that a goal satisfying event occurs. 

By some algebraic manipulation, we see that the weights Wgoai and Wdanger 
are essentially given by two quantities, which are the prior expected utilities 
P (goal) AS (goal) and P (danger) AS (danger). These quantities can be inter- 
preted as the level of danger or goal satisfaction that the agent expects to 
receive on the average. They contain the essence of the prior beliefs that the 
agents holds on the environment, and individual differences can be traced to 
different estimates of these prior expectations. 

3 Maladaptive emotion-based behavior 

By maladaptive emotion-based behavior, we mean here configurations where the 
emotional components of the agent do not allow as large a goal satisfaction as 
were otherwise possible. Such maladaptive behavior patterns are closely related 
to emotional disorders, sometimes called neuroses. 

In our model, we have two parameters, Wdanger and Wgoat (and the corre- 
sponding parameters in the Bayesian framework) that govern the behavior of 
the agent. Suboptimal values of these parameters are the main cause of mal- 
adaptive behavior. There might be several reasons why the parameters have 
suboptimal values. First, the environment may change. In particular, the envi- 
ronment where the parameters were first learned may not be actual anymore. In 
biological agents, this phenomenon seems to be of great importance especially 
because many parameter values seem to be fixed during childhood learning. Sec- 
ond, the learning method might be faulty. This is usually not so much a problem 
in biological agents, where evolution has provided quite efficient learning meth- 
ods, but in artificial agents, the learning algorithms are not always efficient. It 
is also possible that learning of biological agents might be disturbed by diseases 
and environmental toxins. Third, there might be discrepancies in the long-term 
goals. Especially in artificial agents, there may exist several possible definitions 
of goal satisfactions with correspondingly different optimal values for the pa- 
rameters. We can identify three classical types of maladaptation, as discussed 
next 

Anxiety. If the coefficient Wdanger is too high with respect to both Wgoai and 
the constant energy need, this leads to a preponderance of avoidance behavior, 
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reducing approach behavior and inactivity. Thus the agent avoids even events 
whose probability of being dangerous is relatively small. This parallels what is 
called anxiety (or the generalized anxiety syndrome) in humans. 

In a cognitive interpretation, the weight Wdanger is essentially the prior ex- 
pected loss of utility due to dangerous events. An elevated value for wdanger , and 
the associated tendency to avoidance, can thus be caused by a strong belief in 
the occurrence of dangerous events, or on the high value placed on safety. Both 
of these give rise to a cognition that exaggerates the importance of dangerous 
events. 

Depression. If Wg^ai is too low compared to the constant signal emitted by the 
energy need, we have a situation characterized by a relative rarity of approach 
behavior that is not unlike the prevalent behavioral symptom of depression. 
Especially when combined with a low w danger i this lead ultimately to inactivity 
and stupor as in severe depression in humans. In the cognitive, utility-theoretic, 
interpretation, a very low relative value of Wgoai may be due to a pessimistic 
stance in which the prior belief on goal satisfaction low, or due to the low value 
assigned to goal satisfaction. 

Mania. The third pathological state is observed when Wgoai is too high and 
Wdanger is too low. In this State, the agent ignores dangers and is excessively 
active in approach behavior, which resembles the symptoms of the manic states 
in humans. 

4 Conclusion 

We modelled the motivational or emotional system of a simplified agent using a 
system of three needs and three behaviors. It was shown how the behavior of the 
agent depends basically on two parameters. Suboptimal values for these param- 
eters lead to behavior tendencies that resemble some of the behavior patterns 
encountered in humans with emotional disorders. The model gives explanations 
for the etiology of these disorders that are not unlike those given in the psycho- 
logical literature on humans, and gives a new framework for studying adaptation 
phenomena related to emotions. 
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Abstract. 



How can we expect an A-life Agent to learn how to perform tasks when it is not 
told what those tasks are, and it is not provided any indication or feedback as to 
its performance? This is at the heart of the unsupervised learning problem. If 
the Agent were able to learn in this manner, how could specific tasks be 
communicated to it? This is the Goal setting problem. Having been set a task, 
how would the Agent go about choosing things to do that will lead it to perform 
those tasks in an orderly manner? This is at the heart of the action selection 
problem. 



1 Introduction 

This paper presents an ingenious approach to solving these three closely related 
problems, unsupervised learning, goal setting and action selection, for a class of A- 
life Agents. The method presented, the Dynamic Expectancy Model (DEM), builds on 
the “sensori-motor” ([9]), “intermediate level cognitive” model {[3]), or “cognitive 
action theory” ([11]) approaches. Each uses a three-part representation for 
“knowledge” within the Agent - the “Schema”. Such schema are formulated as 
“context - action - outcome” triples. The method presented here overcomes some 
significant technical problems of these earlier models. In particular we present 
considerably improved strategies for the creation of new schema-like objects and a 
robust and flexible mechanism for action selection. 

This paper builds on the conjecture that the proper interpretation of this triple is 
that of an expectation, and that the strength of the connection between context and 
action, and the outcome should depend only on the predictive performance of the unit. 
A conventional view would hold that the strength of the connection should be related 
to any goal or task specific “desirability” of the outcome. In the DEM these 
expectation triples are referred to as p-hypotheses. So called because each is a “micro 
observation” about the Agent and its world that can be created, verified 
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(“corroborated”) and used by the DEM. By adopting the predictive view strength 
changes can be made internally, and task independently, just by testing whether the 
predicted event did or did not occur, independently of reward or reliance on an 
external agent to indicate correctness. Strength changes can be applied immediately 
the prediction is corroborated, and are always attributed to the specific p-hypothesis 
that made the prediction. This leaves the p-hypothesis uncommitted to any particular 
goal; learning is not task directed. In this manner the unsupervised learning problem 
is addressed. Prediction is used in two ways to learn. First to corroborate p-hypotheses 
- tactical learning, and second, by detecting unexpected events, to trigger the creation 
of new p-hypotheses - strategic learning. This ability to learn in the absence of 
motivation is closely related to the latent learning phenomena, ([6], [10]). 

From time to time the Agent will be called upon to perform certain tasks. By 
associating a degree of motivation to any particular outcome the Agent is signalled 
that it should try and achieve that outcome. In this manner, the goal setting problem is 
addressed. 

There may be many possible choices of context and action that will lead to the 
desired outcome. The Agent may select amongst these primarily on the basis of the 
corroboration between these components. Where no action may be selected because 
none of the required contexts are available, then, as context and outcome are 
uniformly represented, the Agent may chain p-hypotheses to form a “policy map” of 
options where p-hypothesis contexts match the current circumstances. This chaining 
process is referred to as spreading activation ([8], [10]) in the DEM. The DEM 
provides a uniform measure (the policy value) which allows the Agent to select 
between competing alternatives for selection. In this manner the action selection 
problem is addressed. 

Action selection and learning models based on Reinforcement Learning ([7], [13], 
[16]) propagate the effects of occasional reward backward to rank sense-act pairings 
in the form of a “policy map” according to iteratively developed estimates of future 
reward. Neural networks provide a sensed input to output mapping function which 
may be refined by a variety of well known learning methods ([5], [14]). Genetic 
Learning algorithms rank and refine sets of condition-action rules according to some 
“fitness function” (past payoff in the case of [4]). Classifier Systems ([1], [10]) 
combine the notion of propagating the effects of occasional reward, using a “bucket- 
brigade” method, for behavior selection with a genetic algorithm approach to create 
new classifier selection elements. Alternatively, action selection may be pre-defined, 
with no little or no learning component ([2], [8] and [15] for a useful review). 

The primary purpose of this paper is to describe and define the Dynamic 
Expectancy Model mechanism, and the next sections provide a description of the 
individual processes that comprise the DEM. Each process is performed once in an 
extended “sense, corroborate, evaluate goals, create policy map, act, predict, learn” 
cycle. All the activities of the Agent are encapsulated into this cycle, which repeats ad 
infinitum. The paper concludes with a simple, but illustrative, example of the DEM in 
operation, demonstrating both latent learning and its ability to respond rapidly to 
changing motivations, followed by some discussion. 




357 



2 |ii-Hypotheses, Signs and Actions - Making Predictions 

All working information in the DEM is held in five main “Lists”, the Hypothesis 
List, Sign List, Action List, Goal List and the Prediction List'. During each cycle the 
system tests the sensory apparatus of the Agent and every element of each list, 
modifying List and List elements (and selects an action to perform) according to the 
processes described next. 

All p-hypotheses known to the DEM at any time are retained in the Hypothesis 
List (abbreviated to H)- The form of the p-hypothesis (h< h/e H) is: 

T J jj 

h/: J^ACt/— > ^ (eqn. 1) 

Each p-hypothesis should be read as an expectation of the form “performing the 
action O/in the context of predicts the occurrence of the condition at time t in 
the future”. The time t is bracketed by ±x, forming a range a times to generalise the 
prediction in the temporal domain (f » x) and to overcome the effects of sensor 
sampling aliasing. The overall predictive ability for each p-hypothesis, the strength of 
the predictive connection is recorded in a numeric variable, the corroboration 
measure (CJ. 

The context (y) and outcome (y’) terms in eqn. 1 are Signs drawn from the Sign 
List (abbreviated 5, thus y‘ e S, y” G S). Signs both define and detect situations that 
can be recognised by the Model. Signs are derived from, and form the interface to the 
sensory apparatus available to a physical Agent. Signs are compound items, 
conjunctions of elemental sensory items (“Tokens”). At each cycle in the execution of 
the algorithm each Sign is either detected or is absent and so evaluates to active or 
inactive for that cycle. All active Signs for a cycle are held on the Active Sign List S*, 
a subset of S- The first time a Token is encountered the DEM automatically creates a 
Sign (containing only that Token) and adds it to the Sign List. New Signs are also 
appended to a working list S“'“- This list drives the structural learning process. 
Tokens, Signs and actions have no a-priori meaning to the learning mechanism in the 
Model, and may be named arbitrarily to suit experimenter or user. 

As a special case Signs will equate directly to a “State”, a unique situation 
detected reliably by the Sign. A Sign may also equate to a “partially observed state” 
(P.O. State), a unique situation unreliably detected by the Sign. Equally, a Sign may 
Just represent a collection of sensory conditions available to the Agent, from which it 
must create predictions of increasing reliability and repeatability. This notion of 
“State” is useful when evaluating the model formally, but it is a poor metaphor for a 
realistic agent problem. The latter case allows Signs to act as abstractions. Adding 
tokens to a Sign makes it more specific; deleting them makes it more generally 



^ A note on notation. Each of the lists is denoted by a single, upper case, calligraphic letter 
{H (Hypothesis), 5 (Sign), A (Action), <5 (Goal) and P (Prediction)). Individual elements are 
denoted by lower case letters (Kr y df g^and p respectively). Sub-sets of lists, and individual 
elements that must be identified across steps or cycles are indicated by superscripts (e.g. S*). 
The symbol ‘e ’ may be read as “member of’, ‘u’ as “union of’, ‘n’ as “intersection of’ and 
‘a’ as “concurrently with”. Other symbols are explained as they are encountered. 
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applicable. In these circumstances we would expect the formation of many 
“candidate” p-hypotheses to be created, verified and possibly discarded before a 
viable Hypothesis List is formed. 

Actions, (CU eve A, the Action List) define the activities the Agent may perform. 
Where Signs defined the interface to the sensory apparatus, actions connect the DEM 
to the Agent’s physical actuators. Selected actions are placed onto the A* (active) 
sub-list. Every action <V has associated with it an action cost. The action cost 
measure indicates the relative effort that will be required to complete the action. 
Action costs may be expressed in any units (such as elapsed time or energy expended) 
that may be determined and consistently applied across all the elements of A. The 
action cost measure will be used in the “cost estimation” process for goal directed 
action selection. The DEM also maintains a memory of recent activations and their 
associated timings for both Signs (from S*) and actions (from A*). Information held 
on these activation traces is used by the structural learning component to construct 
new p-hypotheses. 

A p-hypothesis is deemed active (and so placed on H*) whenever both its context 
Sign {y’) and its action (ou) are active simultaneously, {y’ e S* a ay e A*). A new 
prediction p is created and added to the Prediction List P for every instance of an 
activated p-hypothesis. Note in particular that this mechanism is invoked for all p- 
hypotheses that meet these criteria and that a prediction records three items. First the 
identity of the Sign predicted (i^, i^"from the active p-hypothesis). Second the time 
(derived from t, eqn. 1 , and the current time) that the Sign is predicted to occur and 
third the identity of the p-hypothesis that made the prediction (h?). Elements of the 
Prediction List are active (and so placed on P*) when the time element recorded 
matches the current time within the bounds defined by x. The presence of active 
predictions drives the corroboration process. 



3 Corroboration - Tactical Learning 

For each active prediction, the corroboration measure (C„,) of the p-hypothesis 
responsible for the prediction (b?) is modified according to; 

C„ = C„+ a(l-C,„) (eqn. 2) 

where the prediction was successful (that is, when 4^ e S* , ^ being the Sign 
recorded at the time of making the prediction), and 

C. = C„.- P(CJ (eqn. 3) 

where the prediction was unsuccessful (^ g S*). Active predictions are discarded 
from P once this step is complete. 

The positive reinforcement rate, a (0 < a < 1), defines the rate at which successful 

predictions will strengthen C„. Similarly, the extinction rate, p (0 < P < 1), defines the 
rate at which will be weakened by failed predictions. Where no prediction was 
made the value of C„, remains unchanged. Sequences of successful (or unsuccessful) 
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predictions give rise to the familiar negatively accelerating learning curve, the values 
being normalized such that rises asymptotically toward 1.0 (or falls toward 0.0). 



4 |ii-Hypothesis Acquisition - Structural Learning 

Prediction, or rather the failure to predict an event, drives the structural learning 
component of the DEM, which is responsible for forming new p-hypotheses. The 
opportunity to create new p-hypotheses is indicated by appearance for the first time of 
a previously unknown (“novel”) Sign or by the appearance of a known but 
unpredicted (“unexpected”) Sign. New, unknown. Signs trigger the creation by 
novelty method, recall that the first occurrence of a previously unknown Sign is 
recorded in 5"'” specifically to invoke this learning method. The appearance of an 
unpredicted, but previously known. Sign invokes the creation by unexpected event 
method. Unexpected Signs are detected by comparing the active Prediction List to the 
active Sign List and applying the method to the unpredicted residue (5*- (P*n S*))- 

In either method a new p-hypothesis is constructed from the novel or unpredicted 
Sign as and a Sign (i^ and action (a) drawn respectively from the recorded 
activation trace of values in the Sign and Response Lists. The timing relationship (f 
and hence t in eqn. 1) is derived from their relative positions in the respective 
memory traces. Note that the structural learning mechanism is independent of the 
source of the Signs and actions it will employ. Riolo ([10]) and Shen ([12]) have 
described broadly similar strategies for “rule” creation triggered by “surprise” events. 

To limit the rate at which new p-hypotheses are created the user may specify a 
learning probability rate, X, which determines the probability with which a new p- 
hypothesis will be formed given one of these opportunities to do so. The Dynamic 
Expectancy Model also defines methods for differentiating partially effective p- 
hypotheses by making their component Signs more or less specific by adding (or 
removing) Tokens, and removing ineffective p-hypotheses. The requirements for such 
additional processes are considered further in [17]. 



5 The Action Selection Mechanism 

At each execution cycle the Agent must have some action to perform. Normally, 
the Dynamic Expectancy Model operates in two distinct modes for action selection 
(1) Goal directed selection, (2) Exploratory selection. Whenever goal directed action 
selection is selected, the algorithm attempts to construct a Dynamic Policy Map 
(DPM) from which it may select an action. The construction and use of the DPM is 
described later. 

Where no goal is set the system selects actions in exploratory mode. In the current 
implementation, exploratory selection is made on a random basis. Regardless of how 
the action was selected the learning mechanism continually monitors the activities of 
the Agent, and corroborates existing p-hypotheses and creates new ones according to 
the learning strategies described. 
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Goals - instructions to the system to select actions with respect to some purpose - 
are held on the Goal List {(^. Individual goals are drawn from the Sign List. Placing a 
Sign on the Goal List is a signal to the Agent that it should be motivated to select 
actions that cause that Sign to become activated {i.e. appear on S*)- Each goal Sign on 
(5 is assigned a priority. The goal with the highest priority at any time is referred to as 
the top-goal. Once a goal Sign has been activated (i.e. it appears on S*), it is deemed 
satisfied and removed automatically from the Goal List. The next highest priority goal 
becomes top-goal, or the Goal List becomes empty. 

Purposive goals are, by definition, largely domain specific - they serve some 
purpose. The Dynamic Expectancy Model provides two distinct routes to setting 
goals. (1) Goals may be programmed into or be inherent to the Agent, or (2) they may 
be imposed externally by directly manipulating (J. The former route equates directly 
to our intuitive notion of primary reinforcer. Some things, such as food for a hungry 
animal, or water for a thirsty one, inherently motivate because they are “programmed” 
to do so. In a mobile robot context the detection of a “battery_low” Sign may cause an 
“on_charge” Sign to be placed on the Goal List, possibly with a priority related to the 
extent of battery discharge. The latter route provides an experimenter with a method 
with which to manipulate the goal-driven behaviour of the Agent directly. 



6 Building the Dynamic Policy Map 

Whenever a top-goal is available, the DEM will attempt to create a Dynamic 
Policy Map (DPM) to form a sequence of links from every other Sign in S to the Sign 
currently set as top-goal. The DPM is conveniently represented as a graph where 
context Signs associated with individual p-hypotheses represent the nodes and actions 
embedded within individual p-hypotheses the arcs. The DPM is created by a process 
of spreading activation from the top-goal. The method used to construct the DPM is a 
modified form of the standard breadth-first graph search/construction algorithm. Each 
arc has associated with it a cost estimate, C,,, value. This cost estimate is computed 
from the given action cost of cv and the (eqns. 2 and 3) value defined earlier: 

<— action_cost(o^ / (eqn. 4) 

Consider a situation where C„, is simply p(number of successful predictionsjtotal 
predictions made) by a p-hypothesis - the probability that the p-hypothesis predicts 
correctly. The cost estimate value is then reasonably interpreted as the total 
estimated cost for the average number of attempts that must be made with the given p- 
hypothesis to achieve the transition. A similar interpretation may be placed on the 
case for shown in eqn. 4, with the proviso that the “averages” are now biased 
towards recent experiences. 

Each node (Sign) in the graph will acquire a valence level, v, indicating the 
number of arcs, n, that must be traversed to reach the top-goal “node”. The top-goal 
has a valence level of zero, the ir Sign of any p-hypothesis that leads directly to the 
goal (i.e. where ^’ = top-goal) a valence level of 1, and so on. The policy value, P,, of 
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any node ^ at level n in the DPM is then expressed as a summation of individual 
estimated costs ((CJ') by: 

v=n 

P,(i/)^min(5^ (CJ) (eqn.5) 

V=1 

The policy value for each Sign implicated in the DPM is computed by adding 
the cost estimate for its transition to the minimum cost of the path to its node. If a 
lower cost path is encountered the spreading activation is re-activated for that node to 
minimize path costs at higher valence levels. 

Construction of the Dynamic Policy Map is complete when there are no further p- 
hypotheses that can be implicated, and no further path cost minimization can occur. 
Following construction of the DPM the Agent has an estimate of the total “cost” to 
attain the top-goal for every ^ implicated in the map. Once DPM construction is 
complete, the DEM may simply select the action associated with (min(P^(i' e S*)) in 
the p-hypothesis containing that and route it to the Agent. 

If a currently active Sign is included as a node in the DPM (DPM n 5*), the 
action O/ included in the p-hypothesis arc associated with the Sign node with the 
lowest P„ (eqn. 5) is selected. This is the action with the lowest overall estimated cost 
to achieve the top-goal. Where there is no intersection between the set of active Signs 
and nodes on the DPM, an exploratory action is selected. These new actions will 
either (1) reach the goal directly, (2) lead to a situation where a action selection from 
the DPM may continue, or (3) cause new p-hypotheses to be created, which in turn 
expands scope of the DPM. The DPM is recomputed frequently, whenever goals 
change, new p-hypotheses are formed or existing ones have undergone sufficient 
corroboration to indicate that a different solution path may be preferable. 



7 An Example 

To illustrate the two essential of the properties of the DEM algorithm, 
unsupervised learning and policy map generation, figure 1 shows a simulated robot 
learning task for navigation. The robot may recognize some 74 individual locations on 
a grid within the environment. These equate directly to individual Signs (and, in this 
instance, to “states”). The robot is supplied with four actions with which to traverse 
between these locations. These experimental conditions (but not the actual layout) 
accurately reflect those described by Sutton ([13]). We note that, in simulation, each 
action takes 2.66 seconds on average. This is used as the action cost. Initially the 
robot is allowed 2000 exploratory knowledge gathering (randomly selected) actions. 
This is a period of latent learning, no goal is set nor any other form of reward 
provided during this period. Nevertheless a Sign List and a corpus of p-hypotheses is 
constructed in the Hypothesis List by the novelty and unexpected event methods, and 
subsequently corroborated by the tactical learning method. Initially both Lists were 
empty. Random exploration is inefficient in this environment, the robot tending to 
become “trapped” in “rooms” for extended periods, but this number of actions ensures 
that every location is visited more than once. Ordered exploration techniques can 
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improve the time to complete the exploration procedure, but otherwise have marginal 
impact on the underlying learning mechanism. 

Immediately following this period of exploration the Sign detecting location “A” 
is established as top-goal by the experimenter. The DEM computes the Dynamic 
Policy Map cost estimates, which are visualized in Figure la. The column heights 
corresponding to the total cost estimate from that Sign/location (their position) to the 
goal. After so much exploration, the task is learnt well. The Dynamic Policy Map 
corresponds closely to our intuition of a “cost gradient”, flowing from “room” edges, 
through “doors”, along the central “corridor” etc. The use of a navigation based 
example here allows us to think of the Dynamic Policy Map as a “Cognitive Map” 
([11]) in a quite literal sense. In other agent based applications there is no clear 
mapping of cost estimate to place and satisfactory visualization is harder to achieve. 
Next the Sign detecting location “B” is made top-goal, and the DPM immediately 
adopts the cost estimate configuration of Figure lb. Figure Ic shows the DPM cost 
estimates when the Sign for location “C” is established as top-goal. 





a: DPM to goal location “A” c: DPM to goal location “C” 



Notes: 

f l)a = 0.5, P = 0.2, A.= 1.0 
I 2) Action cost = 2.66 




3) Wall locations shown 0.0 



b: DPM to goal location “B” 



Figure 1: Robot Task and Dynamic Policy Maps for Different Goal Locations 

Not shown in these visualizations is the action that is associated with each Sign for 
each of the three different Dynamic Policy Maps. In each case the action that will be 
selected with the agent at any particular location represents the first action on the path 
with the lowest total estimated cost. As the goal, and hence the DPM changes, the 
action selected in response to any particular Sign stimulus may therefore change 
dramatically. The spreading activation algorithm is fast, the DPM in these examples 
being computed in “real-time” (<10mS on a 166MHz Pentium P5 running Linux). 
[17] describes an extensive series of investigations using the DEM under a variety of 
learning conditions, imposed uncertainty and varying environments. 
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8 Discussion 

The Dynamic Expectancy Model addresses and unifies three issues central to the 
construction of a fully autonomous A-life Agent. First, that of true unsupervised 
learning based on particular properties of the prediction process. Second, the ability to 
define motivations that drive the overt behavior of the Agent. Third, a mechanism to 
select actions according to a uniform measure relative to the Agent’s current state of 
knowledge (the Hypothesis List), its current motivations (the top-goal) and the 
situation as the Agent perceives it (S*). 

The contribution afforded by this work is not so much in the knowledge that all 
the things it does are possible, but in the careful selection and tight integration of each 
of the parts to produce an Agent controller capable of robust action selection 
performance and self-sustaining, self-sufficient learning. The DEM is distinguished 
from the overwhelming majority of learning reactive systems^ in its decisive adoption 
of internal prediction to corroborate schema like connections and away from reliance 
on external or task specific reward. 

The structural learning component of the DEM represents a significant advance 
over that used by its (arguably) closest precursor system, due to Drescher ([3]). 
Drescher’s computationally Intensive “marginal attribution” schema learning process 
is discarded in favor of the use of the learning by novelty and unexpected event. 
Marginal attribution requires extended periods of exploration to first construct viable 
action - outcome pairs, followed by further periods of exploration to establish and 
subsequently extend the context part of the new schemas. In the DEM p-hypotheses 
(the schema like objects) are created as the opportunity arises, but always from a 
combination of events that can occur in the environment (they did occur - and were 
stored in the activation traces.) 

This method is also clearly distinct, in its directness, from the mutation and 
crossover approach to structural learning used in genetic algorithms and classifier 
systems ([1], [4], [5], [10]). Riolo ([10]) describes CFSC2, extending the classifier 
system model with a three part schema like representation, which adopts a forward 
chaining approach to detecting possible future rewards rather than the goal directed 
backward chaining approach used in the DEM. 

The Dynamic Policy Map is not a plan; it is a temporary mapping between sensory 
conditions and actions to take. It does not define a path from start to finish; rather it is 
a characterization of the Signs known to the system according to an estimated cost 
from that Sign to the primary source of motivation, the top-goal. In use, it is similar to 
a reactive look-up table: if this is the current situation, then select this action. In this 
respect, it is similar to the policy map of, say, Watkins’ ([16]) or Sutton’s ([13]) Q- 
learning based algorithms. It is profoundly different in that the policy is computed 
(and re-computed) frequently and quickly, relative to a specific goal, rather than as an 
iteratively formed static policy estimating future discounted and anonymous rewards. 

In comparative tests, [17] using the highly stylized navigation tasks of the type 
described in section 7, the DEM can show dramatic performance improvements (up to 
the order of 40:1) over a conventional Q-learning algorithm (compared to results 
presented in [13]). This is due almost entirely to the fact that the connection strength 
(CJ can be updated at every step, rather than occasionally at the end of a long 



^ A notable exception being Tani’s neural network based “prediction learning” [14] method. 
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sequence of actions. In more realistic environments other factors tend to mask these 
apparent gains. 

The DEM can be seen as an exemplar of Grefenstette’s notion of “Anytime 
Learning” ([4]). New p-hypotheses can be formed at any point in the Agent’s 
existence, immediately extending its behavioral repertoire. Corroboration is also an 
ongoing process throughout the Agent’s “lifetime”. 

The Dynamic Expectancy Model defines a style of autonomous Agent controller, 
it has been implemented and tested in a range of conditions. It learns autonomously 
and has been found to be flexible and responsive to changing conditions. Work on the 
Model continues. 
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Abstract Genetic algorithms display at least one characteristic that is tyi^ical of the 
economic behavior of human decision-makers. 1 show that if a choice problem involves 
uncertainly, genetic algorithms may produce results that are consisteih with an aversion to risk. 



1. Introduction 

Genetic algoritlims are generally employed to optimize an objective function in a 
high-dimensional search space. In this paper, a further dimension is added to the 
algoritJun’s chore: uncertainty. As I show, a very important consequence of 
uncertainty is that tlie genetic algoritlun produces solutions that exhibit risk aversioa 
In fact, risk-aversion may have evolved in decision making behavior of humans in the 
manner suggested by the algoritlun. (See Szpiro [1997a].) Since risk averse behavior 
in humans is a consequence of the existence of utility function, I make the bold (and 
somewhat tongue-in-cheek!) claim tliat genetic algoritlims exliibit utility for wealth. 

Applications of genetic algoritlims in tlie field of economics are, for example, 
Holland and Miller [1991], Artliur [1991], Palmer et al. [1994], Arifovic [1994], 
Allen and Karjalainen [1993], Vriend [1994], Andreoni and Miller [1995], and Szpiro 
[1997a and 1997b]. Researchers and practitioners who seek solutions to optimization 
problems must be aware of the fact tliat genetic algorithms are not “risk-neutral”, but 
that they find solutions tliat are consistent with aversion to risk. Apart from shedding 
more liglit on tlie use of genetic algoritlims for economic decision problems, the 
results of tliis paper can be considered a furtlier example in the burgeoning literature 
on emerging behavior and artificial life. Tlie application of evolutionary tliinking and 
of genetic algoritluns to problems in biology, psychology, linguistics, political 
science, economics, management science and oilier disciplines have shown that 
complex phenomena, which resemble the behavior of hmnan beings or the attributes 
of societies, can spontaneously evolve in such computer simulations. (See, for 
example, Langton [1989] or KaulTinan [1995].) Tlie emergence of risk aversion and 
utility functions is an additional instance of such a phenomenon. 




366 



The next section gives a short description of genetic algorithms and presents an 
investment choice model to which ll»e algorithm will be applied. Section 3 presents 
the evidence on risk aversion that emerges through the genetic algorithm. I also 
analyze the reasons that underlie tliis phenomenon and then show that the algorithm’s 
risk-aversion is linked to tlie selection against bankruptcy risk. In Section 4 I assess 
the algoritlun’s degree of risk aversion and determine its “utility for wealth”. Section 
6 concludes with a short suimnary. 



2. Choice under Uncertainty 



I will not discuss genetic algoritltms in detail and, instead, refer tlie reader to 
Szpiro (1997a). Let me now introduce a simple choice model to which we apply the 
algoritlun. Agents must divide their wealUi between cash and stock. Cash returns a 
yearly, certain interest /, stock carries yearly dividends. In each period an exogenous 
price is given for tlie shares, which is normally distributed witli mean value P and 
standard deviation a. Tlie dividend rate is also a random variable, normally 
distributed witti mean bP and standard deviation 5a. We allow the agents to go into 
debt and to hold stock short.' In tliis case debtors are charged interest, short holders of 
stock must pay tlie dividends. 

In each period all agents liave to make a decision, as to how many shares to buy or 
sell. We assume that tlie agents' decisions are made according to tlie formula 



B,= {^-P,) 



( 1 ) 



where B, is tlie number of shares which are purchased, or - if negative - sold, in 
period t, and where p is a parameter particular to each agent. (The lower tlie market 
price, the more stock do tlie agents buy, and vice versa.) 

The agents start witli an initial cash position Co, trade, and get or pay interest and 
dividends at tlie end of tlie period. Witliout going into details, we state tliat an agent's 
wealtli at tlie end of t periods is 

( 2 ) 



W, =Co(l+/)' +2Vl<^r(l 
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where F, is tlie share price, 6, tlie dividend, and S, tlie accumulated amount of shares 
in period x, P, the share price at the end of the horizon, P tlie mean market price of the 
share, and z~ a normal deviate witli standard deviation a. Tlie agent's expected wealth 
becomes. 



' Holding stock “short” means that the agent sells stock he/she does not yet own, in the hope 
that its price will fall. Then, at a later point in time, the stock is bought at a lower price. 
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£(fF,) = Co(l + /)'+(/?-/’-^)- 

P I 



-(l + to) 



(3) 



In order for expected wealth to be positive, p must be greater tlian P + crlP whenever, 

/ 



S>S = 



and vice versa for 



1+i 



S <S* =—. 
l + i 



(4) 



(5) 



In particular, in order to maximize expected wealth we expect comer solutions, 
since tlie latter is a linear lunction of p. (See equation 3.) Hence tlie P-values of 
"rational" agents should evolve towards Pmux, whenever equation (4) holds, and 
towards zero when equation (5) holds. 

We now describe the subroutines of the algoritiun that lias been designed to solve 
tlie problem in an evolutionary manner; 

• Initialization . Each agent receives an initial endowment of cash, Co, (which could 
be zero), and a p-value, randomly chosen between zero and p^ax- 

• Tradins . A market price for the stock and a dividend rate are given exogenously. 
The agents buy or sell stock according to their trading formula (equation 2). They 
tlien receive (or pay) the interest on tlieir cash holdings and the dividends on tlieir 
stock holdings. This is repeated for t trading periods. 

• Conwutins the fitness . AAer t trading periods, tlie wealth of the agents is 
computed, based on tlie current market price of tlie stock. 

• Rankine of the agents . Tlie agents are ranked according to tlieir market wealth. 

• Choice or mates . Tlie fittest agent has first choice. It chooses a mate from among 
tlie remaining agents, the probability of any one being chosen being equal to its 
wealtli. Tlien tlie next fittest chooses, etc., until half the agents have fonned pairs 
Tlie remaining half die. 

• Reproduction . Each pair has four offspring. We chose the following way of passing 
"genetic information" about p; two offspring receive values of p identical to that of 
their parents; tlie p-values of tlie two other offspring are computed as the 
aritlunetic mean of tlie P-values of the two parents.^ (With tlie first two offspring, 
fitness is preserved, while provisions for improvement are made with the second 
pair of offspring.) The combined cash and stock holdings of the parent-pair are 
divided equally among tlie four offspring. 

• Mutation . A small percentage of tlie agents' p-values are mutated at random. The 
mutated P's again lie between zero and Pm„. 



^ Genetic algorithms generally use the so-called crossover operator to pass on information to 
the next generation. This involves translating the parents’ P-values into binary notation, 
interchanging parts of the binary strings, and re-translating. I use the less sophisticated, but 
simpler method of averaging the parents’ P-values. This creates strong pressure for reversion 
of the P-values to mean, however, and further studies should investigate the effect of binary 
encoding and bit-string crossover. 
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The subroutines of the algorithm, except Initialization, are re-run for a number of 
generations. 



3. Certainty vs. Uncertainty 

The genetic algoriUun is now applied to the investment choice problem described in 
tlie previous section. Tlte population consists of 120 agents, who are each endowed 
with an initial value of p, randomly distributed between zero and 100. Initial cash is 
zero, the interest rate is 15 percent. Hence 5* = °'Vi 15 . The algoritiun goes through 
fifty generations of trading, ranking and reproduction, and after the agents of the last 
generation have been ranked, tlie top 90 percent are "polled", i.e., tlte mean of their p- 
values is computed.^ In order to eliminate random variations, tlte algorithm is rerun a 
few dozen or a few hundred times, and the average P of Utese runs is computed. The 
mutation rate in each generation is 2.5% , and initially the agents' time horizon is two 
trading periods. 

Let us first investigate what happens under conditions of certainty. {P - 50, and 8 = 
5, 6 , ..., 25%.) Since a = 0, tlte market price and the dividend rate is given with 
certainty in every period. Figure 1 presents tlte P-values tliat evolve in this situation. 
To the uninitiated it may seem quite surprising at first, tlial tlte genetic algorithm 
produces exactly the results that are predicted by the tlteory: for 6 < 13%, tlte average 
value of P is close to zero, at 8 = 13%, p abruptly jumps to a value close to one, and 
stays there tliereafter. 



Fig. 1. Certainty 

Number of offspring = 4, Standard deviation = 0.0 , Dividend rate = 5 to 25% 

Time-horizon = 2, Number of generations = 50, Runs = 10 




S 6 7 a 8 10 11 12 13 14 IS IS 17 18 18 20 21 22 23 24 2S 28 
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^ Taking less titan 100 percent of the surviving population eliminates some of the worst 
performers, who would probably have been weeded out in the following generation anyway. 
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Moreover, the results bred by genetic algorithms are very precise: tlie program 
finds the point of indifference between buying and selling stock with an accuracy that 
is equal to the precision of a pocket calculator. When running tlie algorithm with 
increments in tlie dividend rate of 0.00001% the jump from p « 0 to P « I takes place 
at 13.04348%, which is Uie precise value of 8*, to five digits after the decimal point. 

But the more interesting phenomenon occurs when uncertainly enters the model. 
Let the price of the stock and the dividend rate be normally distributed around P and 
5 , with standard deviations o and 8 ct, respectively. (P = 50, and 5 = 5, 6, .... 25% .y 
Figure 2 depicts the results for standard deviations of 1, 3, and 5. 



Fig. 2. Uncertainty 

Details as in Figure I, except Standard deviation 1.0, 3.0, and 5.0 




nvIdwKi r«a (%| 



We note tliat the p-values are not where we expect them to be when compared to 
Figures 1 and 2 ! For 8 < 13% tlie p-values are close to, but distinctly greater than 
zero, for 8 > 13% they are close to, but distinctly smaller than unity. The 
discrepancies are larger the greater Uie uncertainty. Tliis means tliat whenever our 
artificial economy is governed by uncertainly, tlie agents do not buy or sell shares to 
tlie full extent of tlieir possibilities. They have become cautious, tliat is, risk averse. 

How does the phenomenon of risk aversion arise in genetic algoritlims tliat, in 
principle, do notliing but carry out computations in a “disinterested”, that is in a risk- 
neutral fashion? The answer to this question goes to tlie heart of evolution, of genetic 
algoritluns, and of tlie realities of business life. After computing the fitness, that is the 
wealth of the 120 agents, risk neutral agents, who bought (or sold) the maximum 
possible amount of stock, sometimes find themselves at tlie top of the ranking, at 



* The stock markets could be more appropriately modeled willi the log-normal distribution. For 
the sake of simplicity, I use the normal distribution.. 
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Other times at the bottom, depending on whellier they face a bull market or a bear 
market.’ Let us assume a bull market. Obviously risk-neutral agents come out on top. 
But unless tlie direction of tlie market movement persists for an inordinately long 
time, so tliat risk neutral agents can accumulate an extreme amount of wealth, these 
risk neutral agents inevitably fall from the top of the list to tlie bottom when the 
market turns against tliem. In tliis case they die, and tlieir p-values will no longer be 
transmitted to subsequent generations. Even though random mutations may produce 
risk-neutral agents again in a future generation, sooner or later they will disappear too. 

Now let us investigate the fate of risk averse agents. Cautious as they are, they buy 
and sell less stock than their risk-neutral colleagues. In an advantageous market they 
prd)ably will not make it all the way to the top of the list. On the other hand, tliey 
usually do not fall quite to Uie bottom of the ranking eitlier when tlie market turns 
against diem. Hence, Uiey and their P-values are the candidates most likely to survive 
for 50 generations. A fortiori, risk loving agents are weeded out even faster dian the 
risk neutral ones. 

Hence tlie emergence of risk aversion is a consequence of tlie risk of death, the risk 
of an evolutionary dead-end. Only tlie agents who are ranked in tlie top half of the list 
choose partners and will reproduce, wliile the lower-ranked ones tend to die out. And 
with tlieir demise tlie features tliat characterized their behavior also disappear. In the 
framework of economics tlie risk of death corresponds to die risk of bankruptcy. If a 
certain business-strategy cannot assert itself vis k vis die competidon, if it is unable to 
cope with adverse market conditions, the corporations that adhere to it generally 
become insolvent. They eventually go bankrupt, more often than not widiout having 
produced any spin-off companies. As a consequence die characteristics that are 
responsible for die failure are likely to disappear in future generations. With genetic 
algoridims die situation is similar; risk-neutral or risk-taking behavior implies that 
agents adliering to it die whenever exogenous circumstances turn against them. Risk 
aversion is prevalent in genetic algoridims whenever die following two circumstances 
are present; diere must be uncertainty, and die fitness-based selection must contain a 
risk of deadi for badly adapted agents. The existence of uncertainty implies that there 
is no dominant strategy; a choice-strategy that is optimal under certain condidons 
must lead to a low ranking and to failure when conditions change. And failure implies 
dial the choice-strategy to which unsuccessful agent adliered eventually disappears. 
Hence the emergence of risk-aversion is robust; whenever diese two condidons are 
satisfied, a genedc algoridun will exhibit risk aversion. 



4. What’s a Computer’s Utility for Wealth ? 

In economics one assumes diat human decision makers have a udiity for wealth 
U(W). Two conditions on udiity functions are usually stipulated; 

U\W)>0 and t/”(Jf)<0. (6) 



’ Brokers call a period in which most prices on the stock exchange rise, a “bull” market. As 
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The first condition, which expresses the fact tliat more wealth is preferable to less 
wealth, can be considered a truism. The second condition is more questionable. Risk 
aversion implies that tlie marginal utility for wealth decreases, i.e., that the second 
condition holds. (For a short explanation see Szpiro 1997a). However, exceptions are 
conceivable. (See, for example, Szpiro (1992].) In spite of this, it is generally 
accepted that the vast majority of economic agents are risk averse in most choice 
situations and, as we showed above, genetic algorilluns also subscribe to this 
behavior. Hence the question may be asked “how risk averse is a genetic algorithm?” 
As I will show presently, an answer to this question implies an answer to tlie question 
“what’s a genetic algoritlim’s utility for wealth?”.* 

At first it may seem quite absurd to attribute human-like sentiments to an 
inanimate object, like tlie computer as, for instance, tlie preference for more money to 
less, or tlie aversion to risk. A calculating machine lias no use for wealth, nor, by the 
same token, does it have more utility for large numbers as opposed to small ones. If 
one nevertlieless chooses to preoccupy oneself with such arcane reflections as "what's 
a computer's utility for wealth?", say in a maximization problem, one would be 
forgiven for assuming tliat a computer should exhibit beliavior consistent witli linear 
utility for wealtli. AAer all, the machine simply carries out numerical instructions - a 
linear programming algoritlim, for e.xample, to maximize an objective function - so 
how could it mimic anything but linear utility. 

But by now we are no longer surprised to find tliat computers - when faced with 
uncertainty - exliibit what can duly be described as risk aversion. Thus computers 
display what is known in economics as “marginally decreasing utility for wealth”. 
Tlie utility function which seems to govern the computer's behavior can even be 
estimated, and I will show in tliis section that under certain conditions, a computer's 
"utility" for wealth, U(IV), is given (up to integration constants) by the fimction,’ 

C/(1F) = 

Furtlier numerical experiments show, tliat the degree of risk aversion varies with 
tlie length of tlie time-liorizon, tlie probability of survival, and possibly with other 
parameters. (See Szpiro 1997a.) Even tlie value of pmax , wliich determines tlie 
maximum number of shares that can be bought in a single period, lias an effect on the 
degree of risk aversion. Tliis is a bit surprising, and we will investipte the 
implication a little further. When Pm*x is, say, 100, risk averse agents may, for 
example, exliibit a P of 95 under certain conditions. Wlien p^ax is increased to 1000, 
die surviving agents will have p-values of, say, 950. So, in both cases risk aversion 
evolves, but its degree is related to die value of p^a.-! Tlie agents obviously discover 
that more money can be made if more stock can be bought. Caudon remains an 
essential virtue, however: when pitched against risk-neutral individuals, the risk- 
averse ones always win. How risk averse the agents should be in order to survive. 



“bear” markets they designate markets with falling prices. 

* The conclusions of this section are s^ill somewhat speculative, and more extensive studies are 
warranted to confinn the numerical results. 

’ More preci^ly; the algorithm gives results which are consistent with a utility function of this 
form. 
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depends on what tlie rest of the population does. Hence, risk aversion is not an 
absolute, innate phenomenon of genetic algorithms as, for instance, tire maxim "more 
is preferable to less",*- it is a relative phenomenon that emerges because of 
competition with others. 

I choose a time-horizon of 15 periods, which seems a reasonable period for 
medium-range financial decisions, p„.« is 100, and tire probability of survival is fifty 
percent. The P-value wliich is preferred by an individual who has utility function 
U{IV) is given implicitly by the solution to the problem of maximizing expected utility 
of wealth. 



EU{W,) = ^U{W,)A(P,)dP, . 



( 8 ) 



where W, is derived from equation (3), P, is normally distributed around P, with 
standard deviation o, and A{P,) indicates the density of the normal distribution. 

250 runs of the genetic algorithm were performed for t- 15, 8 = 12.5%, ct = 5, P = 
50, and for different initial cash positions (Co == 0, 5000, 50000, 100000). There is no 
reason that initial wealth should liave any eflect on the value of p and, indeed, 
numerical experiments showed that no significant change is discernible when C„ is 
varied. Thus, our first finding is that constant absolute risk aversion prevails, and 
therefore tlie utility function is (up to multiplicative and additive constants), 

U{W) = , (9) 

where k is the degree of absolute risk aversion. (For details, see Pratt [1964] or Arrow 
[1965].) Willi fV, from equation (3), this is the function Uiat is inserted into the 
integrand of equation (8). Since this integral can not be solved in closed form, a 
software package is used to perform the integration numerically for different values of 
p. Tlie p-value that gives tlie maximum expected utility is the one that individuals 
with utility function (9) prefer. 

The mean p-value of the 1000 runs is 14. 1, with standard error of 8.8. Actually, 
since the distribution of tlie results is asymmetric, and due to some outliers, more tlm 
78% of tlie p-values lie witliin one standard error of the estimate, i.e., within the 
interval [5.3 , 22.9] . Inserting tlie figures into equation (8) and solving numerically, 
results in a value A: = 0.00032. The 78% confidence level puts k in tlie interval 
[0.000251, 0.000442]. Tlie measure of absolute risk aversion was defined by Pratt 
[1964] and Arrow [1965] as 

r„(lf) = -U\W)IU\in . (10) 

Most empirical studies on human decision makers have shown that the degree of 
relative risk aversion, 

( 11 ) 



'More is preferable to less" is innate to the genetic algorithm insofar as it is incorporated into 
the fitness function: the wealthiest agents are ranked first. 
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is constant for wealth, and approximately equal to two, while some studies put this 
value as high as 10. (See, e.g., Szpiro [1986].) Hence the degree of risk aversion of a 
genetic algorithm corresponds to Ae one of a relatively poor person, whose net assets 
are about $4,500 to $8,000. Witlr wealtli of, say, $10,000 tlie genetic algoritlun 
exhibits a degree of relative risk aversion of 3.2, with tlie 78% confidence level 
interval being [2.5 , 14. 4J. Hence, a genetic algoritlun witli tlie above parameters, 
exhibits greater risk aversion tlian does the average human decision maker, but tlie 
order of magnitude is at least similar to the higher empirical estimates. 

Even though the manner in which the question at the beginning of this section was 
formulated was somewhat tongue-in-cheek, it is nevertheless imperative to know 
what degree of risk aversion the genetic algorithm exhibits, in order to assess the 
solutions tliat are found to problems containing uncertainty. On the other hand, by 
varying the parameters of the program one can custom design the genetic algorithm to 
mimic a certain attitude towards risk. In tliis manner the algorithm can be endowed 
with a pre-detennined degree of risk aversion. Table 2 exliibits tlie p-values of some 
of the possibilities. By varying tlie number of offspring between six and 120, and 
simultaneously varying tlie time-horizon between ten and tliirty periods, p-values 
between 6.7 and 39,0 can be simulated with appropriate combination of parameters. 
This corresponds to degrees of absolute risk aversion of between about 0.00028 to 
0.00145, or - for a person with wealth of $10,000 - to relative risk aversion between 
2.8 and 14.5. 



5. Conclusions 

The paper started by displaying tlie power of so-called genetic algoriUuns for solving 
problems of choice under certainty, and tlien went on to discuss tlie algoritlims’ 
behavior in situations of uncertainty. Seemingly incorrect answers given by the 
computer program could be traced back to risk aversion which, it turned out, is 
inherent to such algoritlims as an evolutionary strategy against the risk of bankruptcy . 

It may be argued Uiat tlie attribute of risk aversion has evolved in the human race 
during the millennia of its history, because homo oeconomicus has learnt tliat witli 
limited time-horizons it does not pay to be risk-neutral or risk-seeking. With genetic 
algoritlims - computer programs tliat are designed to search for optimal solutions by 
emulating the Darwinian tlieories of evolution - tlie situation is identical. The first 
condition on tlie utility of wealtli, tliat more is preferable to less, enters tlie algorithm 
through tlie fitness function (agents are ordered according to tlieir wealth, the richest 
OIKS being ranked first). Hie second condition, wliich corresponds to risk aversion, is 
an outcome of evolutionary pressures. 
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The growth and operation of all living beings are directed by the interpreta- 
tion, in each of their cells, of a chemical program, the DNA string or genome. 
This process is the source of inspiration for the Embryonics (embryonic elec- 
tronics) project, whose final objective is the design of highly robust integrated 
circuits, endowed with properties usually associated with the living world; self- 
repair (cicatrization) and self-replication. The Embryonics architecture is based 
on four hierarchical levels of organization: 1) the basic primitive of our system 
is the molecule, a multiplexer-based element of a novel programmable circuit; 2) 
a finite set of molecules makes up a cell, essentially a small processor with an 
associated memory; 3) a finite set of cells makes up an organism, an application- 
specific multiprocessor system; 4) the organism can itself replicate, giving rise 
to a population of identical organisms. Our ongoing research efforts try to meet 
three challenges; a scientific challenge, that of implementing the original specifi- 
cations formulated by John von Neumann for the conception of a self-replicating 
automaton; a technical challenge, that of realizing very robust integrated cir- 
cuits capable of self-repair and self-replication; and a biological challenge, that 
of attempting to show that the microscopic architecture of artificial and natural 
organisms, i.e., their genomes, share common properties. 
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Abstract. This paper deals with the problem of finding a suitable framework 
for designing computer simulations that could help us determine the minimal 
requirements (both material and organizational) for the origin of the first full- 
fledged autonomous systems. The design of a particular model that takes into 
account some fundamental thermodynamic requirements is offered and 
discussed. Behind this work, there is a belief that Artificial Life models can 
inform biology on several fundamental questions (such as the origin and 
definition of life) but only provided that they assume more realistic and 
grounded premises to lead us to more conclusive results. 



1 Introduction 

The gap between complex self-organizing phenomena (physico-chemical dissipative 
structures) and the simplest biological entities we know of today is too big to be 
bridged without some intermediate stage(s). Therefore, we will focus on a 
hypothetical stage in which (compartmented) chemical reaction networks self- 
maintain recursively and operate in the environment but are not yet complex enough 
to self-reproduce reliably and begin a process of open-ended (Darwinian) evolution. 
Such a standpoint is supported by the work of different authors who defend that some 
sort of acellular [21] or cellular [15] metabolism must precede systems that synthesize 
early polymers exhibiting template activity. 

We claim that these chemical reaction networks differed from previous ones in the 
sense that they could manage actively the flows of matter and energy necessary to 
sustain their own organization. Therefore, they should be regarded as the first 
autonomous systems. However, the problem of the origin of these ‘minimal 
autonomous systems’ is much more complex than could have been suspected years 
ago. In recent articles ([13], [17]) some of us have criticized Artificial Life models of 
(basic) living organization, arguing that certain material and energetic aspects which 
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are crucial to understand the logic of such organization have been systematically 
disregarded in the field. Now we try to move forward along the same critical lines 
offering a concrete proposal: the design of a simulation model of a minimal (proto- 
biological) autonomous system that takes into account a few central issues pointed out 
in those previous papers. 



2 Autonomy: a Fundamental Concept for Biology 

The main theoretical assumption of this paper is that ‘proto-metabolic’ autonomous 
systems —even if they disappeared very quickly in a geological time scale— played a 
crucial role in the origin of the first living beings. By ‘autonomous system’ here we 
mean a system in which a set of processes self-constitutes recursively through 
functional/adaptive interactions with its environment. The fundamental difference in 
relation to more simple forms of self-organization is that autonomy involves an active 
participation of the system in the construction and modification of its own boundary 
conditions and constraints as a response to external perturbations. 

In the context of prebiotic chemical reaction networks the appearance of 
autonomous entities may be achieved only if some components (or component 
aggregates) in the network act as rules/constraints on other components and on the 
transformation processes they all go through. The result of that multiple constraining 
action constitutes an autonomous entity when a new set of components and aggregates 
is produced in a recursive way, creating a physical border that establishes the 
mechanisms through which the system will interact with the environment. Since 
autonomous systems are able to carry out functional actions -like modifying their 
own boundary conditions- in a variable environment, we can say that they are able to 
self-maintain “adaptively” in it. Self-construction and robust self-maintenance 
involves achieving some active control over the flows of matter and energy through 
the system in order to sustain its far-from-equilibrium self-organizing processes. 

Therefore, the key to basic autonomy lies on the generation of a suitable set of 
constraints: those that actually define the new rules of behavior of the system, its 
boundaries, and the terms in which it will keep the relationship with its environment. 
Such a set must include global and local constraints whose different actions are well 
coordinated, ensuring the conditions under which the system is physically viable [20]. 
This implies, on the one hand, developing mechanisms to solve possible physico- 
chemical problems (like the osmotic problem) and, on the other hand, articulating the 
main coupling mechanisms on which the energetic maintenance of the system is 
based: couplings with the external resources (transduction and transport mechanisms) 
and couplings among exergonic and endergonic processes within the system (by 
means of various energy currencies). The fundamental constraint that allows all this to 
be achieved is a membrane} 



' The relevance of cell individuality in the origin of life was already highlighted by Oparin [16], 
Since then, the emergence of membrane-enclosed complex chemical networks as a crucial 
step in the appearance of life has been defended by numerous authors (see [7] and references 
therein). In the field of theoretical biology, the autopoietic school ([9][20]) has also claimed 
that the role of the membrane is central in the characteristic organization of the living. 
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Two types of basic components are necessary to generate and integrate a 
membrane; (i) structural components that can self-assemble and define the physical 
boundary of the system (global constraining action) and (ii) functional components 
apt to carry out different tasks —catalysis, transport, absorption of energy— at 
particular loci (local and specific constraining action). An adequate combination of 
the constraining action of these two types of components can potentially generate a 
membrane that is not simply a physical border which marks a spatial 
distinction/asymmetry between what can happen inside and outside, but also a device 
which may modify functionally some of the external conditions of the system. 

Thus, an autonomous system is able to carry out functional actions on its 
environment (active transport actions, for instance), whereas the environment only 
interacts with it in a physico-chemical sense. This is why autonomy is deeply 
connected to the concept of ‘agent’ (defined as a system whose influence on and /rom 
the environment is not of the same type). Acting functionally on its environment 
allows a system to achieve autonomy with respect to some external conditions and 
preserve its organization even in a situation of temporary absence of material and/or 
energetic input. 

This dimension is crucial to understand the agency of autonomous systems, and is 
made manifest especially if we look at the problem of self-maintenance from a 
thermodynamic perspective. Purely relational-constructive approaches (like some 
which are briefly reviewed in the following section) miss the logic involved in the 
mechanisms that allow a recursive control of the flows of matter and energy needed to 
carry out functional autonomous actions. That leads them to hold an internalist view 
on autonomy, which we think must be thoroughly revised. 



3 Critical Review of Artificial Simulations and Realizations of 
Self>maintaining (Cellular) Systems 

It is difficult to find in the literature simulation models that share the approach we 
take in this paper. Most computational systems do not even consider 
energetic/thermodynamic requirements, nor spatial ones (although there are some 
interesting exceptions, as we will point out here). An alternative way to produce 
artificial proto-organisms is through experiences ‘in vitro’, which have the advantage 
of not needing to be explicit about those aspects, since experiments are carried out in 
physical environments. Even if they lack the plasticity of computational models, and 
their results are normally obtained under extreme experimental conditions, ‘in vitro’ 
approaches seem to have a promising future, as we shall see below. 

We cannot begin a survey on computational models of self-maintaining systems 
without some reference to the (now classical) work on ‘autocatalytic sets’ [5] and 
‘algorithmic chemistries’ [6]. Despite the big differences between them, both 
approaches look into the problem of understanding the general rules of chemical 
reaction combinatorics (in order to generate computational autocatalytic networks or 
self-maintaining function algebras) and they both get interesting results. However, 
apart from thermodynamic aspects, these models do not consider fundamental spatial 
constraints like those related to the presence and action of a membrane. 
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Computational autopoietic models give a step forward in this direction. These try to 
implement the powerful idea of autopoiesis in an artificial environment, tackling the 
problem of how to show through a computer program the complementary relationship 
that membranes and chemical reaction networks hold in the natural (biological) world. 
The “qualitative chemistry” they propose [11, 12] takes place in a discrete, two- 
dimensional space where different types of ‘particles’ move and interact according to 
certain rules. A membrane is introduced or formed in that environment and some 
“self-maintaining” process can begin, provided that an appropriate set of particles is 
enclosed by it. This process basically consists in “repairing” possible breaks, because, 
once it gets constituted, the membrane cannot change its shape, nor grow or shrink. 
Besides, transport is merely passive (it absorbs and emits substrates according to 
spatial constraints rather than to intemal/extemal concentrations) and, most important, 
energetic/thermodynamic requirements are completely disregarded.^ 

There is another type of computational model in which the problem of how 
(proto)cellular structures form is addressed, and in which physical aspects are 
seriously taken into consideration. These models try to simulate the way lipid 
aggregates self-assemble, assuming realistic features of the lipid molecules, of their 
interactions, and of the environment where they coexist (or could have coexisted in 
the past). Recently, quite interesting results have been obtained through this approach 
[4, 11]. Nevertheless, the focus of these models is not on the interrelation between a 
self-assembled membrane and a network of chemical reactions enclosed and 
integrated by that membrane. They provide solid computational evidence to support 
the fact that protocell structures (like different types of micelles) are likely to be found 
in aqueous prebiotic environments, but little more than that in the direction we want to 
explore. That is to say, they demonstrate the plausibility of structural membranes but 
do not show their potential to assume active roles in prebiotic scenarios. 

Finally, let us include among computational models some recent work that marks a 
very interesting research direction. This is the design of a model (called Nidus [19]) 
which aims to investigate the essential components and interactions required to 
support the origin and evolution of living organizations. The author makes a serious 
attempt to introduce energy in his scheme, but he deals with artificial evolutionary 
issues on top of basic organizational ones, and that makes the whole thing quite more 
difficult. Our main criticism is that the model does not take into account properly the 
question of the physical border/membrane, although its relevance is theoretically 
acknowledged. 

Models of prebiotic systems can also be studied in real environments, as we said 
above. Realizations ‘in vitro’ are more reliable and straightforward than 
computational simulations in the sense that they do not have to make explicit 
assumptions about material and energetic requirements (further than the actual 



^ This is a criticism that can be extended to the whole theory of autopoiesis. Maturana and 
Varela may claim that any realization of autopoiesis (as a chemical system) must keep away 
from equilibrium and, thus, be open to flows of matter and energy, and all the rest of it [9]. 
However, they assume that this has no influence on the logic of the system; that is, they do 
not look into its possible theoretical consequences (considering ‘organization’ fully 
abstracted from and prior to ‘structure’). Hence, their perspective on autonomy becomes 
strongly internalist, disregarding the interaction with the environment as a crucial factor for 
creating and maintaining biological organizations. 
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specifications of the experiment) in order to set up the rules of the test. In this context, 
the idea would be to generate autonomous chemical reaction networks, but without 
using techniques/components directly borrowed from biological systems. This is a 
very hard task and, so far, achievements are only partial (like medical implant 
chemical devices, which are functional in so far as they get inserted in a fully 
integrated reaction network). Nevertheless, even though the technology to create 
‘chemical reaction automata’ is just making its first moves, some interesting results 
have already been achieved. 

A particularly relevant result was the production ‘in vitro’ of micellar systems 
hosting a reaction which yields a surfactant product that later assembles 
spontaneously in the boundary [1]. In these experiments, micelles or vesicles -once 
formed- have a catalyzing effect on the internal production of the component that 
makes them up, and, consequently, they induce their own replication. Such systems 
are considered to be good examples or realizations of (minimal) chemical autopoiesis 
[8], since they show in a very simple and clear way the complementarity between the 
physical boundary and the reaction processes that take place within that boundary. 
However, these vesicles do not perform any kind of specific transport process. 

Although aggregates of amphiphilic molecules (micelles, vesicles, liposomes) do 
not perform any specific transport processes, D. Deamer and his colleagues have 
recently implemented different systems to overcome the problem of the accessibility 
of substrates to the encapsulated environment (see [3] and references therein). 

However, neither Luisi's nor Deamer’s systems would overcome the problem of an 
osmotic crisis derived from the accumulation of polyanions inside the vesicle once 
this has initiated a replicative cycle, unless adequate experimental conditions are 
externally established. 

For a good discussion and revision on 'chemical reaction automata’ we direct 
readers to an article by P. Bro [2]. This author defends the relevance of 
thermodynamic aspects and of the role of “permselective enclosures” to achieve 
autonomy. Our approach is akin to this, as the reader will be able to see next. 



4 Description of the Model and Discussion 

This model addresses the problem of how a cellular system may become a self- 
maintaining autonomous one. The idea is to begin with a set of components 
“enclosed” by a membrane, establishing the basic reactions and transport processes 
that these components can undergo. Then, given a variable input of energy, the system 
must find how to manage it adequately to be able to self-maintain. This means that it 
has to opt for a particular set of couplings between the different reactions/processes 
that take place in it; otherwise it will not have any chance to overcome the two basic 
problems it faces: (i) preserving the membrane (and its dynamic behavior) and (ii) 
avoiding an “osmotic crisis”. Furthermore, an additional problem could be relevant, 
depending on the type of energetic input provided to the system: the generation of 
energy reserves. 
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4.1 Basic Components 

We propose eight different components: 1, L, e, E, a, A, R, x. The small-letter 
components 1, e, a, are the precursors of the capital letter components L, E, A. The 
membrane is a chain of L components (with a rather structural role), where E 
components (with more active functions) can be inserted. R components, energy- 
reserve compounds, are made of a certain number of A components. There are two 
possible ‘energy currencies’ in the system: a “chemical” one, based on the reaction 
between a and A components, plus an “osmotic” one, based on the concentration 
gradient of x. This is fully coherent with Skulachev's laws of Bioenergetics [18], 
which state that the viability of a biological system requires, at least, one currency 
related to membrane processes and one to internal reactions. 

Capital-letter components cannot cross the membrane (L and E can only be added 
to or subtracted from it), whereas small-letter ones can. However, the x component 
can only be transported by means of the E components inserted in the membrane 
(mediated transport, see Table 1 below). Therefore, regarding possible spatial states, 
small-letter components can be ‘in’ or ‘out’; L and E components can only be found 
inside the system or as part of the membrane (‘mem’ state); and A and R components 
just have one possible spatial state (‘in’). Accordingly, the “movement” of such 
components is restricted to changes in those spatial states. Finally, the system can 
only receive the input of energy from the environment through some of the E^^^ 
components, which become activated into E* 

^ mem 



4.2 Fundamental Processes 

There are two main types of interaction processes: transformation reactions and 
transport processes. Although they may become more complicated as a result of later 
couplings, in principle both reactions and transport processes will involve two 
different components. In the first case, one component will transform into another; in 
the second, one will change its spatial state in the presence of a membrane component 
(L or E, depending on which kind of component is being transported). 

Processes will be classified as exergonic (AG<0) or endergonic (AG>0), according 
to whether they may occur spontaneously or not. For instance, degradation reactions 
will generally be exergonic, whereas synthetic ones will need to be coupled to have 
some chance to occur. Concentration gradients set the spontaneous direction of 
passive transport processes (those in which 1, e, a components get "moved" in or out), 
but active transport escapes this rule and depends on other factors (like the external 
input of energy — Ae— that E^^^ components may receive). In addition, each 
fundamental process is assigned a ‘probability function’. This determines the rate or 
frequency with which the process will actually take place. 

Certain exergonic processes (when not coupled with endergonic ones) might give 
away some heat. This heat contributes to a global increase of the temperature of the 
system. However, heat does not accumulate in the system, since the temperature 
gradient established between the system and the environment provokes a global loss 
of heat from the system, and a subsequent decrease of its temperature. This reflects 
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the important idea of having a flow of energy through the system (rather than just an 
input of energy) to keep it running. 



Table 1. Fundamental processes 



‘Synthesis’ 




‘Degradation’ 




1 > L 


AG > 0, P 


L > 1.^ + q 


AG < 0, P 


^ ^ 


AG > 0, P 


E > e + q 


AG<0,P^^^ 


a > A 

in 


> 

o 

V 

o 

> 


L > L 

mem 


AG dep. on ILI, P^^ 


L + L_^- 


— >L +L 


E >E + q AG<0,P 




AG dep. on ILI. P ^ 


E* >E +q AG<0,P 

mem mete ■* £■•->£ 


E + - 


— >E +L AG>0,P„ 


A >a +Q 


AG < 0, P^ 


E +Ae- 


— >E* AG<0,P_„ 


R > nA 


AG dep. on lAt, P . 


nA > R AG dep. on lAl, P 

^ A->A* 







‘Non-mediated Transport’ 

1 +L >1 +L 

m mem out mem 

AG dep. on Alll , P . 

*• in-oul i. to 

1 +L -—>1 +L 

otu mem m mem 

AG dep. on Aill , P 

^ in-oul I, oi 

e +L >e +L 

m mem out mem 

AG dep. on Alel , P 

^ in-out e, Jo 

c + L — > e + L 

out mem in mem 

AG dep. on Alel , P 

‘ in-out c.oi 

a + L > a + L 

in mem out mem 

AG dep. on Alai , P 

in-out a, lo 

a +L >a +L 

out tnem in mem 

AG dep. on Alai . P 

in-oul a. ot 



‘Mediated Transport’ 

(a) if lx! > 1x1 

in out 

E + X > E + X (+ q) 

mcm in mem out •* 

AG < 0, P 

X, Jo 

E* + X > E + X 

mcm out mem in 

AG<0,P 

Ex.Oi 

(b) if Ixl < Ixl 

in out 

E* +x - — >E +x AG<0,P 

mcm m mem out Ex, itv 

E + X > E + X (+ q) 

mem out mcm in * 

AG<0.P , 



4.3 Main Variables, Parameters of Control and Rules of the Model 

The main variables of the model are the concentrations of all components inside the 
system, and the size and composition of the membrane (number of L and E 
components, from which a “volume” of the system can be derived). 

The probability functions associated with each type of process could be taken as 
external parameters initially, but it is probably better to work with them as variables 
that depend on the concentrations of the components involved, and on the 
temperature. In this context, temperature is defined as a global variable related to the 
loss of heat that can take place in certain uncoupled exergonic processes. This 
variable would have influence not only on the values of the probability functions of 
the different processes that are considered in the simulation, but also on a general 
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probability function Pg that can induce changes in the tendency of the system to 
establish couplings. 

Among the variables we also introduce the osmotic coefficient (0), which is 
directly related to the balance of concentrations of all components inside and outside 
of the system. If 0 crosses a certain threshold value the system will “burst”. 

The number of parameters of control and the type of conservation rules that will be 
used, of course, depend on how the environment is conceived. In a first 
approximation, we shall consider the environment as infinite (or very large compared 
to the system). Hence, external concentrations will be included as parameters of 
control, and so will be the external “temperature”. In addition, the following 
parameters are considered: the initial concentration of all components inside the 
membrane, the size and composition of the initial membrane and its 
minimum/maximum sizes, the free-energy differences of the different processes, the 
losses of heat associated with various exergonic processes, the initial values of the 
probability functions, the critical value for the osmotic coefficient, and the number of 
A components needed to make up one R. Last but not least, a special parameter of 
control is included: the energy-input function. At first, on simplicity grounds, it can be 
set to a constant value. However, in order to analyze the behavior of the system under 
variable external conditions this function would be modulated accordingly. 

Taking the environment as an infinite reservoir of basic components, conservation 
rules will only apply to internal processes. This is coherent with the idea of defining 
the system as thermodynamically open. Therefore, although external concentrations 
remain constant, within the boundaries of the system the number of components will 
be limited. A similar criterion is considered for energy balance in the model. 

Finally, several rules for coupling processes should be introduced. These are meant 
to restrict the high number of possibilities that the system has to establish couplings. 
As a starting point, we may assume that: (a) Only two processes can be coupled at the 
same time, excluding the possibility of one being be the inverse of the other; (b) Only 
two reactions together or a reaction with transport process can couple; (c) The 
combination of the two processes must be exergonic (i.e., 0) and is assigned a 

new probability function; (d) Once established, the couplings remain fixed. 



4.4 Restatement of the Problem and Main Goals of the Model 

After presenting the basic features of the model, we can pose the problem it addresses 
in a more concrete way. The fundamental aim could be stated as follows: given (i) a 
membrane and a set of components with their initial concentrations plus a set of 
possible interactions among them with their corresponding probability functions, and 
(ii) the values of all the other control parameters, try to explore which are the possible 
sets of couplings that allow the system to reach self-maintenance. 

In principle, we expect four different outcomes: 

a) Disintegration, either by progressive decay (the membrane shrinking to its 
minimum size) or by annihilation of any of the system variables. 

b) “Burst”, if the osmotic coefficient crosses its critical value. 

c) A stationary state in which the system self-maintains without growth. 
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d) Progressive growth until the system reaches its critical maximum size. This 
would require the introduction of a division process in the model. 

Hence, the idea is to create a computational tool through which we can study how 
real self-maintenance is affected by various aspects of the model. This means 
analyzing the influence of different factors on the system dynamics, such as; the 
energy input, the initial and external concentrations of free components, the original 
size-composition of the membrane, the temperature, the osmotic coefficient, the 
critical size values, etc. 



5 Discussion and prospects 

This model framework is meant to be a step forward in our attempt to develop a more 
realistic and grounded ALife that may contribute to our understanding of some 
fundamental problems of Biology. Here the issue of self-maintenance in a cellular 
system has been specifically addressed. Nevertheless, being aware that our model 
assumes a set of working hypotheses that should be continuously revised, let us 
discuss briefly in this final section some problematic points. 

In the present model, space is highly abstracted. This could be a possible criticism 
when compared to other options that do take into account spatial constraints and 
movement, even if it is in an unreal two-dimensional environment. However, the issue 
is not completely disregarded. Our model takes into account that there must be a 
significant distinction between an 'inside' and an 'outside' of the system. Furthermore, 
transport processes are introduced and, in fact, they hold the key to avoid the problem 
of the osmotic crisis. 

Another simplification of the model is that most of what relates to time and process 
rates is externally controlled. Nevertheless, the problem of synchronizing all the 
internal physico-chemical processes involved in the metabolic network is also 
important for autonomy [20]. In real biological systems, for instance, a complex web 
of enzymatic compounds is responsible for sorting out that problem. However, on 
simplicity grounds, free catalysts have not been included as fundamental component- 
constraints of the system. 

Of course, more complicated virtual scenarios for subsequent development and 
improvement of this type of model could be easily imagined. For instance, a possible 
way to take into account external restrictions on material/energetic resources would be 
to introduce a situation where a population of different cellular systems -like the ones 
proposed here— share a finite environment and struggle to self-maintain in it, 
competing for the (limited) resources, growing and self-reproducing by simple 
division, disintegrating and giving out their components and energy to the 
environment, and so on. New issues that might alter some of the working hypotheses 
we assumed here will surely arise in that scenario. 
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Abstract. Chemistry is basically the formation of complexes by combining 
together basic elements and/or simpler complexes, or by splitting bigger 
complexes as a result of chemical reactions. The basic elements are glued as a 
consequence of the affinity that exists between them. This affinity can be explicit, 
implicit, symmetric or asymmetric. Object-Orientation offers a natural 
computational framework to capture the very simple and recurrent mechanism 
which underlies chemistry. The basic elements and the complexes constitute the 
low-level classes, which can be naturally organized in a hierarchical structure. 
Every complex is an objet, constructed on the basis of simpler complexes or basic 
elements, and structured as a special type of ordered computational tree (when 
affinity is symmetric) and graph (in the asymmetric case). Among the attributes of 
these objects, one finds their concentration and their reactivity, which are essential 
values to simulate the dynamics of the whole system. This paper introduces, in a 
UML type of representation, some basic computational structures and mechanisms 
which should find a natural place in all simulation interested in complex formation 
and the time variation of the complexes concentration and reactivity. 



1 Introduction 

From its origins, Alife has always taken as fundamental the conception of software 
environments able to model the appearance and the disappearance of complexes 
constructed from simpler complexes or basic elements. This preoccupation leads to a 
succession of keystones in the Alife production such as Fontana’s alchemy [3], 
Kauffman’s autocatalytic networks [5] or Holland’s echo models [4]. These software 
environments, like the major part of work being done (or which should be done) in 
the “Alife spirit”, could have two major applications. Following an adequate 
parameterization, they could be proposed to chemical or biological practitioners who 
could find in these “computational platform” a helpful way to model a particular 
system. We then can speak of a set of computational design patterns, or software 
shell, which should easily be adapted to the simulation of a particular chemical or 
biological environment. On the other hand, these simulations can stand on their own 
and allow the discovery of generic laws, expressed in mathematics or linguistic terms, 
characterizing the behaviour of emergent and complex systems. For instance, in 
Kauffman’s simulation of Boolean nets [5], some laws establish the stability or 
instability of generic networks as a function of the range of their connectivity. Others 
set the number of attractors as a function of the number of units in the network. In our 
specific case, these laws could connect, for instance, the number and the maximal size 
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of possible complexes with the number of basic elements, with the number of affinity 
sites, the number of sources or any time constant for the concentration/reactivity 
variation, etc. 

This work departs from its ancestors by essentially relying on Object-Oriented 
(OO) computation instead of indifferent or ad-hoc computational tricks. Its originality 
lies in the way the chemical environment is conceptualized in a network of classes 
and their specific interactions. From its origins, OO computation has allowed 
programming to come closer to physical simulation (the first OO language was indeed 
called “Simula”) instead of being constrained by the processor set of elementary 
instructions. There is a today trend which make more and more possible to abstract 
software engineering from the processor by naturally using high-level natural concept 
making up the problems as the bricks of the resolution. This goes together with the 
increased use of visual modeling language such as UML [6]. UML proposes a set of 
well defined diagrams (transcending any specific OO programming language) to 
naturally describe and resolve problems with the high level concepts inherent in the 
formulation of the problem. It is enough to discover the main actors of the problem 
and how they mutually relate and interact in time to build the algorithmic solution of 
this problem. It is beyond the scope of this paper to present UML although some of its 
symbols will be used to describe our chemical software environment. A simple and 
introductory overview of the UML language can be found in [6]. However, by 
deliberately restricting our use of UML to the only class diagram, readers familiar 
enough with OO programming should not have any understanding problem. 

The first section of the paper will describe and discuss the basic classes of our 
chemical software environment, their main attributes and methods, and how they do 
relate to each other i.e. putting in words what compose the class diagram. The second 
section will describe in more details what is a complex and how it is structured by 
means of classical computational constructs such as trees and graphs. The third 
section will discuss the key notion of the identity of a ceil and a complex, and how the 
latter can easily be duplicated as the result of a chemical reaction. The fourth section 
will tackle the dynamic aspects; What are the time-changing variables and how do 
they change. The last section will describe some preliminary experimental 
simulations, discuss problem posed by visualizing the simulation, and sketch some 
first observations. 



2 The Chemical 00 Class Diagram 

Like indicated in figure 1, in the UML class diagram, the 6 basic classes of our 
software environment are Component, SourceComponent, CellularComponent, Cell, 
Complex and CellInComplex. 

Component: It is the highest super-class and presents as methods and attributes the 
properties to be inherited by all components of the environment, whether sources, 
cells or complexes: essentially aspects like their concentration and how this 
concentration change in time. Concentration is a very essential property of any object 
of the simulation because, for obvious memory size constraints, an object of the class 
component is not a single chemical object but rather the set (whose cardinality is the 
concentration) of all identical chemical objects. The concentration will be modified 
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either by natural decrease or increase, or as a result of chemical reactions and their 
specific rate. 
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Fig.l. The UML class diagram of our chemical OO environment 

SouTceComponent: It is the first sub-class of Component. The energy sources are 
objects of this class. Their role is to supply energy to the cellular components (this 
justifies the diagram link between SourceComponent and CellularComponent), which 
by absorbing this energy, can increase their concentration and their reactivity. Each 
cell presents specific receptors to the sources. The sources can remain at constant 
concentration or decrease as a function of the cellular component absorption. 
CellularComponent It is the second sub-class of Component and is the super-class of 
two further sub-classes: Cell and Complex, which are the essential elements to be 
combined as a product of chemical reaction. Their potentiality to combine is a 
function of a new attribute called reactivity. For every complex to form or any 
reaction to occur there is a certain minimum reactivity that cells or complexes must 
possess. Since the main activity of these cellular components is to connect to each 
other, one essential method is the testing of their mutual affinity to detect possible 
connection. 
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Cell, the first sub-class of CellularComponent describes the basic objects of the whole 
system: the cells. The fundamental structural attribute of the cell is its list of affinity 
sites which allow cells to combine together in order to form complexes. A lot of 
variations is possible in the way these affinity sites characterize the cell. Depending 
on the choice, the affinity between two cells will be explicit, implicit, symmetric or 
asymmetric. Suppose the simplest case, explicit and symmetric, where every cell has 
a set of sites, which simply have to match any of the sites of any other cell. A vector 
of integer values Sfm] could characterize cell /, such that for cell i to connect with 
cell j it would be enough to have for one of the vector element: Sfm] = Sj [n]. The 
affinity between cell i and j will then be symmetric i.e. The complex i <->j is 
symmetric. Here we will restrict to yes or no type of connectivity. In previous works 
dealing with immune networks, we have allowed smoother type of connectivity [2]. 
The asymmetric situation occurs as soon as the cell presents two sites of affinity 
(typical in biology) naturally called the keys Kfm] and the locks Lj[n], and when 
connection turns out to be only possible between a key of the first cell and a lock of 
the second. In such a case the complex i ^ j will be asymmetric so different from 
the complex) i. 

The implicit type of affinity occurs when two cells connect yes or no as a result of a 
function of their respective affinity. For instance, there could be a connection between 
i and j only if Sfm] + Sfn] = 8 (we are closer to chemistry now, with the well-known 
“octet rule”, where affinity between basic elements is implicit and symmetric). 
Irrespectively of its implicitness or explicitness, which finally always lead to a yes or 
no affinity, the real big difference for the rest of the paper is the symmetric or 
asymmetric version of the affinity. As shown in the class diagram, the cells are 
associated with the energetic sources and have an additional attribute (the “1-0..*” 
association with the complex class) resuming all the complexes in which they appear. 

A final attribute of the cell is its identity, which simply needs to be an ordered index. 
It is clear that the way this identity is defined depends on what we take to be unique to 
any cell (the identity would correspond to the key in the data base language). One 
natural option would be to relate this identity with the affinity sites, and to uniquely 
calculate this identity index as a function of the affinity (whether symmetric or not). 
However this is not an obligation and one could just impose an identity besides the 
way they mutually connect i.e. two cells with the same affinity sites could still be 
different (it seems to be the case when one looks at the chemical periodic table where 
different elements with a same affinity mechanism are grouped in the table). 

Complex-, the second capital sub-class of CellularComponent describes the second 
essential family of components: the complexes. These complexes are product of the 
cells or simpler complexes aggregations. The following section will see how are the 
complex structured and what in this structure make them unique. As shown in the 
class diagram, complexes are aggregate of cells. An attribute called 
numberOfInstances is a vector of integer whose elements are the number of times one 
specific cell appear in the complex. Like all computational objects, complexes need to 
be “constructed”. Here at most four types of constructor methods are necessary to 
construct complexes: 1 - from two cells (giving rise to the most basic complex) ; 2 - 
from one cell and one complex (it can still be splitted in two different constructors 
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when the affinity is asymmetric); 3 - finally from two complexes. Complex relates 
with their most direct partner: the CellInComplex which are responsible for the 
stmctural description of the complex. Each complex has one and only one 
CellInComplex called the “headCell” and which can be seen as its “front door”. 

CellInComplex: as soon as a cell gets in a complex, it is transformed into a 
CellInComplex object. CellInComplex relate to cell since the identity of a such an 
object is the same as its associated cell. CellInComplex are responsible for coding the 
tree or the graph structure of the complex, to be described in the next section. It is 
enough to know that, as classically done for trees and graphs [1], and like shown in 
the class diagram, CellInComplex has pointer attributes (my subCells, mySupCells, 
myRightCell, myLeftCell) pointing to other CellInComplex objects. This is the well- 
known computational trick which recursively allows to treat tree or graph structures 
in the easiest way. Finally all natural methods associated with complex and allowing 
to test affinity between two complexes, to compare two complexes and to duplicate 
one complex have natural counterpart in the CellInComplex class. Thus two 
CellInComplex objects can test their mutual affinity, be compared and be duplicated. 



3 The structure of a Complex 

Take the simplest case where two cells “1” and “2” (in the following we will label the 
cell by just their integer index) present asymmetric affinity (for instance one key of 
“1” matches one lock of “2”). A new complex appear," 1” is the headCell of this 
complex (an object of class CellInComplex) and it has one subCell “2” (also an object 
of class CellInComplex) (fig.2). Suppose then that “1” connects to a second cell “5”, 
the new complex is depicted fig.3 where now “1” has a vector of two subCells: “2” 
and “5” (a vector of two CellInComplex). It is important to keep the subCells ordered 
so that “2” is on the left of “5”. Instead of a vector of subCells, one possible 
alternative would be to keep only “2” as a subCell, and to make “2” having the 
rightCell “5”. “1”, “2” and “5” are all CellInComplex with, as subCells and 

rightCells, pointers to still to come CellInComplex. 



Fig.2. The simplest complex Fig.3. A three cells complex 





So far complexes seem to be structured as “Letfrnost-Child-Right-Sibling” tree [1]. 
The reason for maintaining a left-to-right increasing order in the position of the 
subCells is to ease the detection of equal complexes. Can the structure of the cell be 
kept as a simple ordered tree? Let’s go one step further and suppose that we now have 
two complexes like indicated in fig.4, and the CellInComplex “5” of the first complex 
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has affinity with the cellInComplex “6” of the second complex (a key of “5” matches 
a lock of “6”), In the figure you can see the reaction and its result; 




Fig.4. Merging two tree complexes to obtain a new graph complex. 

It is clear that any cellInComplex in the structure (except the headCell), and in 
contrast with classical tree, can be pointed by more than one pointer (here “6”). Also 
the headCell in the resulting complex “2” has a new rightCell “3”. Detailing the 
resulting complex: “2” has rightCell “3” and has subCells “3” and “5”, the first “3” 
has subCell “1”, 5 has subCells “3” and “6” and “8” (the subCells are well ordered), 
the second “3” has subCell “6” and “9” (here again it is important to check the order 
of the subCells). For asymmetric complexes, the two rules to be obeyed in order to 
preserve the identity of the complex are; 

1: a head Cells can have rightCells so that a cellInComplex has, in fact, an ordered 
succession of front doors. 

2: the whole vector of subCells must be ordered. 

Suppose that a similar reaction occurs again, but now in the symmetric affinity case. 
The good news in the symmetric case is that it is quite easy to preserve the tree 
structure (like shown in fig.5). The order must be kept at the level of the subCells. 
There is only one headCell with no rightCells. Due to the “vertical symmetry” 
(besides the horizontal symmetry which entails the subCells order), the only novelty 
is to keep as headCell, the one with the lowest integer index from all the cells on the 
extreme left of the complex. So the two new rules are; 

1 . like for the asymmetric case, all the subCells must be ordered 

2. the head Cell is the cell with the lowest integer index from all the 
left most cells. 

Both in the symmetric and asymmetric cases, there is no additional difficulty when 
loops appear in the tree due to transversal coimections among the cells. 
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We have seen the identity of a cell to be given by an integer index. The ioentity of a 
complex is given by the cells inside of it, the number of instances of each of it and by 
the way the connectivity is organized in a tree-like or graph-like structure. Once two 
complexes must be compared to see, for instance, whether a possible new complex, 
resulting from a reaction, is not already existing in the system, the comparison 
mechanism is defined in a recursive way, departing from the headCell of each of the 
complexes. In order to make this recursive testing possible, the tree in the symmetric 
case and the graph in the asymmetric case must comply with the two architectural 
mles given in the previous section. 

Any time a reaction occurs between two complexes i.e. when a cell of the first 
connects with a cell of the second, two duplications must take place, roughly: A + B 
= A + B + AB. AB is constructed by first duplicating A and B and then by 
connecting these two duplicates, adding a link between the two affine cells. Like the 
affinity testing and the comparison mechanism, the duplication is done recursively, 
departing in all cases from the headCell. In the cases of graph, since any 
cellInComplex could be visited more than one time, they contain a very helpftil 
attribute called “aCopy” which is assigned an exemplar of their last duplication. 
Thus, whenever any cellInComplex is visited a second time in order to be duplicated, 
it is enough to return its copy and just stops the duplication propagation. Once a 
whole complex is duplicated all “aCopy” objects are reset to null. 



5 The dynamical aspects 

Beyond the metadynamics just discussed, dynamics is a key addition in our software 
environment. In a lot of Alife simulations, the dynamics (how variables associated 
with objects change in time) and the metadynamics (how the set of object changes in 
time by appearing and disappearing) aspects are kept separated. In this work, they can 
be taken into account in a simultaneous way. However and similarly to all the 
software mechanisms and constructs proposed here, these dynamical aspects are 






396 



something to adapt and to parameterize for any particular application. In the 
following, only some very simple and preliminary mechanisms, set by default, will be 
proposed. The two main dynamic variables, associated with all cellularComponents, 
are the concentration and the reactivity. 

For sake of clarity, the complex will be indicated by upper case and the free cells 
(cells not in complex) by lower cases. The reactivity variable of a complex A i.e. 
R(A,t) can change in time and do play the following role: a reaction between two 
complexes will be possible only if the lowest reactivity of the two complexes is 
beyond a certain threshold: min (R(A,t),R(B, t)) > threshold to obtain a complex AB. 
The reactivity of any cell is a function of the sources of energy and the energy 
receptors of the cell. This reactivity changes in time as a function of the sources of 
energy: R(a,t+dt) = f(R(a,t), sources(t)). The reactivity of a complex is obtained from 
the reactivity of the cells inside, either by summing or averaging the cell reactivity. 

The concentration of a complex A i.e. C(A,t) can change in time. When two 
complexes could possibly merge, the probability of their merging will be dependent 
on their respective concentration (to keep simple, here we have considered the 
product), so that: P(A+B->AB) = k*C(A)*C(B) (P() being the probability that the 

reaction occurs). The concentration of a free cell changes in time as a function of the 
sources: C(a,t+dt) = f(C(a,t),sources(t)). Now when a complex is created, the initial 
concentration of the complex is given by a part of the product of the two 
concentrations of the components to be merged: 

C(AB,t) = 0; C(AB,t+dt) = k*C(A,t)*C(B,t) (typically a first order chemical reaction) 
Also the concentration of the two merging complexes decreases consequently: 
C(A,t+dt) = C(A,t) - C(AB,t+dt) ; C(B,t+dt) = C(B,t) - C(AB,t+dt) 

The additional change in time of the concentration of a complex will be function of 
the cells inside the complex. One of the obvious benefits provided by the formation of 
a complex is that all cells inside can take advantage of the others. So a cellInComplex 
even poorly affected by the energy sources can grow in concentration or reactivity 
because other cellInComplex are fed by the sources. 



6 Visualizing and first results 

For various reasons it might be profitable to visualize some simulation run in order 
for instance to check the correct running or, more profitably, to detect some emergent 
phenomena like threshold or percolation effects. Fig.6 shows a snapshot of the 
miming with the simple visualizing adopted here. Every cell position is a function of 
its identity (the cells are the squares with the color Indicating their concentration, 
sources are shown by an additional cross). Complexes are shown by the links existing 
between cells. The cellInComplex have a white circle inside. The simulation initiates 
with a certain number of free cells and a certain number of sources. Some precious 
information such as the current number of complexes, the size of the biggest complex, 
the average concentration, and their evolution in time can be shown on the right side 
of the main window. 
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Fig.6. A snapshot of the nmning software 

Two preliminary interesting observations are very reminiscent of qualitative results 
already discussed in previous Alife works. The figures 7 show the growing in time of 
the size of the complexes; in both figures, the first plot just indicates the number of 
distinct cells, the other plot indicates the number of all cells (accounting for cells 
appearing several times). For certain values of the number of affinity sites and the 
number of initial cells, like in Kauffman’s autocatalytic networks [5], one can attend 
some kind of phase transition in the number of complexes and the size of the biggest 
complex (very similar to percolation phenomena) that the simulation produces (fig. 
7a). All of a sudden most of the complexes have become cross-cormected into one 
giant structure. However, by playing with the sources concentration and the time 
constants, this phase transition effect can become a dynamic, instead of a static 
structural, phenomenon, i.e. occurring only after some time, the time needed to reach 
a sufficient reactivity or a sufficient concentration. 

Certain complexes, once constructed, have the possibility to infinitely grow in size, if 
not stopped by their concentration and the concentration of the ingredients they need 
to grow. They are all the complexes that simply loop on themselves. They can very 
easily grow infinitely by constantly adding replication of themselves. The simplest 
case is obviously the complex composed of two times the same cell, one as the 
receptor and the other as the connector, but it is enough to have one cell of a chain 
connecting back to a previous element of this chain. In fig. 7b only the plot showing 
the total number of cells explodes in time. Fontana [3] made similar observations in 
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his alchemy simulations, and recovered by suppressing these self-replicating 
structures as soon as they appear in his simulations. 




Fig. 7a) and b) The evolution of the size of complexes with and without multiple 

7 Conclusions 

Several important works have marked Alife by their attempt to computationally 
replicate the formation of complexes from simpler elements. However because these 
works are either based on formal but not practical computational constructs [3] or 
because they lack sufficient precision on the computational mechanisms which 
underlie them, it seems to remain large methodological gaps to fill with the natural 
computational friendliness and flexibility allowed by OO computation. The work 
described here attempts to indicate some very preliminary steps on the road to a more 
complete, natural and coherent OO chemical environment. These first steps have 
mainly been restricted to the definition of the basic classes, the way they connect to 
each other, and the recursive computational structure underlying symmetric or 
asymmetric complexes. Current and future developments of the software will allow 
extending the construction of complexes through the cleavage of existing ones. 
Whereas one obvious source of novelty in chemistry is obtained by the aggregation of 
complexes, new aggregates, not existing before, can also appear following an internal 
transitory re-organisation of the complex, and then its splitting. 
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Abstract. We have constructed a simple model of a proto-cell that sim- 
ulates stochastic dynamics of abstract chemicals on a two-dimensional 
lattice. We have assumed that chemicals catcilyze their reproduction 
through interaction with each other, and that repulsion occurs between 
some chemicals. We have shown that chemicals organize themselves into 
a cell-like structure that maintains its membranes dynamically. Further, 
we have obtained cells that can divide themselves automatically into 
daughter cells. 



1 Introduction 

The emergence of cells is one of the major treinsitions in the evolution of life. 
When primitive self-replicators such as a hypercycle of RNA enzymes evolve into 
a living cell, they must acquire membranes that will separate them from their 
noisy environment. It is well known that a hypercycle system can easily be bro- 
ken down by the occurrence of parasites. Compartmentalization of a hypercycle 
system is a simplest way to avoid the disaster [2] [14] [15]. At the same time, 
however, it should be noticed that it is also true that parasites can drive the 
increase of diversity and complexity of the replicator network [6]. In order to 
examine the balance between stable reproduction and diversity, we should study 
a relationship between an internal replication eind a cellular structure which 
enclose it. 

Many models of proto-cell structures have been proposed. For example, it is 
well known that long-chained fatty acids spontaneously form micelles or vesicles 
when submerged in water. Luisi and his group demonstrated experimentally 
the self-organization and self-reproduction of liposomes, and showed that such 
vesicles maintain self-replicating RNA within them [16]. Theoretical models for 
self-organization and self-reproduction of micelles are studied well [1] [13]. 

There is an another essential feature of cells, that is, self-maintenance. Living 
cells metabolize and sustain their membrane by themselves, and the boundary 
of cells are defined by the membrane. This mutual dependence enables the co- 
evolution of internal chemical networks and the membranes. The coevolution 
between the two is presumed in the early stages of the cell evolution. With 
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respect to this point, Ganti proposed a model for primitive life termed ’the 
chemoton’, which presents three indispensable functions of the proto-cell: it has 
a metabolic cycle for assimilation; it maintains its membrane; and it replicate 
its genetic information [3] [4]. Varela also insisted that the boundary of cells (i.e. 
the cell membrane) must be organized and maintained by the cell itself [9] [10], 
He presented a model on a two-dimensional lattice of an autopoietic cell that 
can maintain its membrane. 

Our purpose in the present study is to demonstrate how such primitive cells 
can emerge and evolve from a simple set of chemical network. A model of self- 
maintenance and self-replicating cells in one-dimensional space was proposed by 
the same author [11]. In the model, we have shown that self-reproduction of the 
cellular structure emerges spontaneously and there are two distinct processes of 
replication showing potentially diflferent heredities. 

In this paper, we extend our previous model for application to two-dimensional 
cases and showing that this system has a potential for further evolution. 

2 A stochastic particle model 

We simulate a discrete space-time dynamics of chemicals in a two-dimensional 
space, where chemicals catalyze each other’s reactions. Each chemical is given 
as a particle with/ without anisotropic shape that moves around on a triangular 
lattice. Particles demonstrate two basic motions: hopping to neighbouring sites 
and rotating at one site. In addition to this behavior, a particle can change its 
chemical qualities. The former is termed a mobile transition, the latter a chemical 
transition, and both are determined by the potential energy of the particle. 

We assume that there is a repulsive force between some chemicals, thus the 
physical potential of a chemical C at the site x {Ec{x)) is computed by sum- 
ming up the the repulsion potential of all chemicals at the site x and its six 
neighbouring sites. The mobile transition probability Pc{x,xo) from site a; to xq 
is computed from the difference in the potential magnitudes as 

Pc{x,xo) = Rdif f{Ec{xo) - Ec{x)), 

where Ec{x) gives the potential energy of the particle C at the site x. The 
diffusion parameter Rdif is fixed for all particles. 

The chemical transition probability Pc->c{^) from the state C to C at the 
site X is given as 



Pc-,c>{x) = Rc^c-{x) f(Gc‘ -Gc + Ec'{x) - Ec{x% 

where Gc represents the chemical potential of C. The reaction parameter 
Rc-*c'{x) is controlled by a catalyst found on the site x. One constraint is 
given to the form of the function / in order to satisfy the thermal equilibrium 
condition, as follows: 



fi~AE) 



( 1 ) 
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We define five different kinds of chemicals, A;M;W;X and Y, in a system. 
Each particle can belong to any one of these chemicals. W plays the role of 
abstract ’water’, and cannot change into any other chemical. X is the material 
with high chemical potential, though it is not an autocatalytic chemical. A is a 
unique autocatalytic chemical in the system. Their reaction processes are. 



A + A <-+ AA 
and 

X -b AA ^ A A AA. 

These chemical reactions only occur among particles that occupy the same 
site ^ The forward and backward reactions have an equal reaction parameter, 
which is given by the following formula: 



Rx-^a{x) = Ra^x(x) = Bx^a + CaA{x)^, 

where A(a:) denotes the number of A on the site x. In the above equation, 
Bx>-^a ^A are the base rate and the catalysis coeflBcient, respectively. As a 
secondary process, A produces M as a, co-product of the total reaction network. 



X + AA M + AA. 

The reaction parameters are given by 

Rx->m{x) = Rm->x{x) — Bx*^m + CmA{xY- 

In addition to the above reactions, we introduce the natural decay of chemi- 
cals into Y where Y has the lowest chemical potential. 

Consider that there is a source of material X in this system so that the 
reaction parameters between X and Y break the pattern of symmetry, as 



Rx—y — Bx<->y 
Ry->X = Bxi^Y + Sx 

where denotes the strength of the source X. 

We cissume there is repulsive force between M and other chemicals like oil in 
water. In the following simulations, we examine three different kinds of potentials 
on the chemical M. First, M equally repels all the other chemicals around it. 
Second, M has an anisotropic repulsion regardless of the kinds of chemiccds. This 
feature will be described latter. Third, in addition to the anisotropic repulsion, 
the repulsion force also depends on the kinds of chemicals. 

^ Note that a number of chemicals can occupy each site. In this study, the average is 
one hundred. 
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3 Simulation Results 

3.1 Formation of Cells 

First, we simulate the case where the repulsion force depends on neither the 
kinds of chemicals nor the form of M. Starting from the homogeneous initial 
state which hcis rich amount of A, a system can maintain replication of A and 
reproducing co-product M. Chemicals A and M can aggregate to form Turing- 
like patterns. Figure l.a shows an example of the pattern generated; the spots 
of M are formed among W and A. 

The second case, where the repulsion force depends on the orientation of 
M molecule gives different observation. Here M has an anisotropic potential, 
illustrated in Fig. l.b. When these Ms are placed on a triangular lattice, the 
’head’ can be aligned in any of three directions (e.g. 0,7r/3 and 27 t/ 3). We assume 
that it can change its direction stochastically, with the transition probability 
given by, 

PmM'(x) = Rrot f(EM{x) - E'jj{x)), 

where M' denotes M which has another orientation. 

Figure l.b shows the repulsion potential generated by M for other chemicals 
A,W,X and Y. The repulsion force from M becomes strongest when M and other 
chemicals are on the same site (indicated by black in Fig. l.b). The repulsion 
is the second strongest at the front of or behind M M (dark gray sites) and 
relatively weaker at the other four side-sites (light gray). We also assume there is 
repulsion between Ms when their directions axe different. Thus, the M molecule 
tends to take the same direction as neighbouring Ms. 

When M has this kind of anisotropic repulsive force, the clusters are orga- 
nized differently than they are in the isotropic cases. The clusters of M become 
thin films that we simply name ’membrane’ (see Fig. l.c). The difference in 
the repulsion potenticil between front- and side-sites of M affects the thickness 
of membrane. Also, when the repulsion between M with different direction is 
stronger, the membranes tend to run straighter. These effects allow us to get 
membranes that have various degrees of flexibility. 

When we start from a single ’cell’, that is, a spot of A enveloped by mem- 
branes as shown Fig. l.d, this structure can maintain itself stably because A 
within the cell keep reproducing themselves and sustain the membranes by sup- 
plying M; simultaneously, the membranes keep A from diffusing outward. Note 
that this structure collapses when the membranes axe broken (see Fig. l.e). 
Chemical A Ccinnot sustain reproduction because they lealc away through the 
defect in the membrane. In the absence of a supply by A, the membranes decay 
and disappear. 



3.2 Cell division 

Living cells are not closed systems. They must ingest nutrients and excrete wastes 
through their membranes. In this section, we study the Ccise where M shows 
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(d) 



(e.l) t=3000 (e.2) t=6000 



Fig. 1. Cluster formation. Picture (a) presents an example of the patterns formed 
by molecules with isotropic repulsion. Picture (b) illustrates the anisotropic field of 
repulsion around M. The depth of gray denotes the intensity of repulsion. Picture (c) 
shows that M forms thin films and separates the domains of A. The white domain 
contains rich amount of A. The gray and black domains are dominated by W and M, 
respectively. In picture (d), a cell structure maintains itself stably. Pictures (e.l) and 
(e.2) are the snapshots of the collapse of a cell starting from a cell which lacks the 
upper right membrane. 



selective repulsion depending on the kind of molecules. We assume that the 
repulsion between M and two chemicals X and Y is much weaker than that of 
the other chemicals. 

In this case, X and Y can permeate through the membranes at a rate pro- 
portional to the gradient of their density. Because there are more X in the 
environment than on the inside of the cell, the cells can absorb the external 
chemical X and grows gradually. When the cell reaches a certain size at which 
it has outgrown its stability, it begins to generate a new membrane inside. This 
finally divides the mother cell into daughter cells (see Fig. 2). These new cells 
repeat the process of growing and dividing. Sometimes a cell fails to sustain its 
membrane structure and dies, due to a shortage of materials or to interference 
from other cells. 

We can change the flexibility of the membranes by altering the repulsion 
strength between Ms. This has the result of varying the division dynamics of 
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cells. Examples are presented in Fig. 3. A strong repulsion between M results in 
the formation of stiff membranes, and the shape of cells becomes more regular 
(Fig. 3. a and 3.b). On the other hand, Fig. 3.c represents a cell with more flexible 
membranes, which form in the presence of low repulsion values for M. These cells 
divide themselves irregularly at some narrow part. 




(a.l) t = 3000 (a.2) t = 6000 (a.3) t = 9000 




(a.4) t = 12000 (a.5) t = 15000 (a.6) t = 18000 



Fig. 2. Snapshots of cell division. The cell grows by ingesting X. Next, the membrane 
grows inward. Finally, the mother cell divides into five daughter cells. 



4 Discussion 

We have demonstrated the model of a self-maintaining cell. The cell has an 
internal autocatalytic cycle of chemicals, which maintains the membrane by itself 
and the membrane keeps the cell from collapse. We have also shown that the 
self-maintaining cell can replicate itself spontaneously; a transition is made from 
molecular reproduction to cellular reproduction. 

In real life, the earliest membranes may have been simpler and rougher than 
the phospholipid membranes. The marigranule [5] represents an example of a 
primitive cell. It has rough shell and can ingest amino acids from the environ- 
ment. Though the materials of which marigranules consist are very elementary 
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(a) (b) (c) 



Fig. 3. The variations of the cell. The shapes of cells and their manner of division 
depend on the values of repulsion between M. Pictures (a) and (b) show cells with stiff 
membranes. Picture (c) shows cells with flexible membranes. 



molecules, there is no linkage between the organization of shells and its internal 
dynamics. If such structure established a symbiotic relationship between its em- 
bedded chemical network and its membrane, the membranes could become the 
targets of the Darwinian selection and evolve into more complex structures. 

In this study, we have shown that the cell divides in a different manner ac- 
cording to variation in the interaction strength between Ms. It is also true that 
we can change some properties of biological membranes by varying their com- 
ponents. For example, unsaturated fatty acids components soften the membrane 
and cholesterols do the opposite. 

Self-replicating spots in the two-dimensional reaction-diffusion system [12] 
[8] have been well studied. However, the kinds of replicating pattern cannot be 
as diverse as the patterns generated by cells with membranes. The cellular mem- 
brane can function as a boundary condition to the internal chemical network, 
and conversely, the internal reactions determine the cell shape. In this sense, the 
chemical reactions within cell membranes can be richer than those without mem- 
branes. The compartmentalization of chemicals will allow cells to be regarded 
as units of evolution, because it maintains the identity of their contents during 
reproduction. 

Koch previously discussed the division mechanisms of phospholipid vesicles 
by considering the property of mechanical energy [7]. Our model demonstrates 
an analogous dynamic division mechanism. 

The evolution of selective permeability of the membrane must be considered 
in future studies. Cell membranes determine how cells communicate with the 
environment, including other cells. Cells selectively receive stimuli from the en- 
vironment and from other cells, and respond to these stimuli. Our cell model 
provides several possible approaches by which to observe the formation and 
evolution of membrane functions, and of how the interaction between cells is 
generated by each cell’s own internal dynamics. 
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Abstract. This paper looks at concerns for very early evolution near 
the origins of life. At the very least the objects that are our most distant 
ancestors must have been objects that could persist for some time, and 
replicate. The models presented here examine the evolution of such proto- 
life objects whose only characteristics are how long they exist for and how 
quickly they can replicate. No explicit selection regime is needed, instead 
these objects evolve due to their own simple dynamics until a persistence 
error threshold is reached. The last part of the paper discusses ways in 
which this error threshold can be first moved and then overcome by the 
respective evolution of mutation rates and then predation. 



1 Introduction 

This paper is part of on-going research into how the notion of persistence can 
be usefully applied in research into dynamical systems, especially those capable 
of modelling living systems. This research has taken persistence to be a funda- 
mental and important characteristic of objects within the system. For the object 
to be able to replicate or perform other interesting behaviours, it must at least 
persist for enough time to perform those behaviours. Hence the ability to persist 
is seen as a prerequisite for enabling more complex behaviours. If the objects 
can replicate then there is the possibility that a persistence ratchet [1] will be in 
place and hence that effective evolution can occur. 

The models presented here take a preliminary look at the necessary conditions 
to enable persistence ratchets at the very origins of life. What they do not 
address is the question of how replication emerged, but rather they look at the 
dynamics of evolution the moment after replication has first been made possible. 
In particular the first model assumes that no predation occurs and that the proto- 
life objects have no significant influence on their environment. Hence the only 
characteristics of these simple units of replication are their time for persisting 
and their replication rate. 

Whilst the model presented here is best seen in the context of the nude gene 
or proto-cell versions of the origin of life, similar concerns about the ability of 
‘replicating’ autocatalytic cycles to persist will also arise (for details of current 
theories about the origins of life see either [2] [3] or [4]). In any event, if life is 
seen as requiring a unity, as in the autopoietic view [5], then it is upon the initial 
persistence ability of these units that this model is based. 
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2 The model and first results 

The world is made out of a number of ‘cells’. Each cell contains one object 
that is described simply by its ability at persisting and replicating. Its ability 
at persisting is a number that represents how many time steps it is capable of 
existing for. If this number is ever 0 then the cell is considered to be empty as 
no object can exist for no time. Replication ability is represented as the number 
of time steps between replication events. Each object also has an associated age 
counter and replication counter, counting the time since birth and since the last 
replication event respectively. These counters are not globally synchronised but 
are locally measured by the number of update events that each object has had 
applied to it. Update events are stochastically distributed across the cells. One 
global time unit is said to have occurred every N update events. 

When the age counter reaches the same value as the persistence ability then 
the object dies, its persistence value (and other variables) is set to zero. When 
the replication counter reaches the replication ability value (or higher) then 
replication is attempted. If there are any empty cells then an exact copy of the 
replicating object is placed in an empty cell, the offspring’s counters are set to 
zero as well as the parent’s replication counter. If there are no empty cells then 
nothing happens. In the first experiments presented here the replication ability 
is set to attempt a fixed number, X, of offspring per lifetime. So an object with 
persistence ability Y will always have a replication ability oiY/X. 

A mutation operator is applied to each cell when it is updated and can 
either increase or decrease the persistence value of the object in the cell. In the 
first experiments the replication ability is also changed accordingly. Negative 
mutations are set initially at an average of 1 mutation every 10 time units. 
Positive mutations occur at an average of 1 mutation every 100 time units. A 
mutation that takes the value of a cell from 0 to 1 has ‘created’ a replicating 
object from an empty cell. Each run starts with N empty cells. 

Figure la), shows how the mode! behaves when N — 1000 and the replication 
ability is set to attempt 3 offspring per lifetime. The three lines shown on the 
graph represent the best, average and worst levels of persistence of the objects 
at a given time during the run. As can be seen, the ‘population’ of objects soon 
reaches a persistence error threshold [6] beyond which they are unable to evolve. 

A simple explanation of this error threshold can be gained through looking at 
the necessary requirements to maintain a persistence ratchet [1]. A persistence 
ratchet is said to be in place when the replication rate is higher than the net 
negative mutation rate as this guarantees that the level of persistence will at least 
not decrease and is likely to increase, on average, over time. Once there are no 
empty cells, then no object can replicate until another object dies and hence the 
average replication rate falls to the average death rate which, here, is determined 
only by the average ability at persisting. The mutation rate cannot be altered 
in this first model, so to maintain the persistence ratchet the replication rate 
must be kept high by keeping the death rate high which in turn requires the 
persistence ability to be low. Hence there is a level of persistence above which 
the persistence ratchet will not be in place because replication cannot occur 
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Fig. 1. Getting closer to the persistence error threshold 



fast enough to ensure that negative mutations do not accumulate. Hence we can 
calculate the threshold by seeing that the average persistence ability must be 
less than the expected time between net negative mutations. 

So, with the mutation rates as given above we can calculate the expected 
average persistence as being less than 11 units of time. Note that this value is 
independent of the replication ability. Figure lb) shows the average persistence 
level reached when the attempted number of offspring per life is changed from 
1.5 to 9 and then the final point comes from attempting replication every time 
.step. As the number of offspring per generation is increased, so the average level 
of persistence reached gets closer to the calculated threshold of 11 units of time. 

Figure Ic) shows a graph of average persistence level reached against different 
net negative mutation rates, with each line representing a different replication 
rate. What can be seen from both of these graphs is that changing the replication 
rate enables the expected level of persistence to get closer to, but never more 
than, the theoretically calculated error threshold. As we want to examine what 
does affect this error threshold we will, for simplicity, set the replication ability 
at the maximum possible (one attempt per time step) for the rest of the paper. 

Figure Id) shows the effects of changing the number of cells available, N , on 
the distribution of persistence values achieved. As N is increased so the average 
persistence level reached gets closer to, but never exceeds, the calculated error 
threshold. Interestingly, the maximum persistence level in the population does 
get higher than the threshold. So one way to achieve a higher level of persistence 
in some of the population would be to increase the space available. 
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Fig. 2. Moving, then overcoming, the persistence error threshold 



3 Adding to the model 

As a first attempt to get beyond this error threshold we’ll look at making the 
mutation rate evolvable. In other words objects can evolve to protect themselves 
against some negative mutations. So each object holds a variable for the average 
number of time steps between its own negative mutations. The value is copied 
during replication events, and mutated during update events both positively and 
negatively at the rate it represents^ . This value is not allowed to exceed 50 time 
steps, half the fixed value for positive mutations. 

The effect on the dynamics of the system are clearly visible in figure 2a). 
Through evolution the objects are able to ‘push’ the error threshold towards a 
higher limit by making the net negative mutation rate as low as possible, and 
thereby the objects can reach new heights of persistence. Whilst clearly a useful 
innovation, we still have a persistence error threshold to get beyond. 

Next we look at what happens when objects are able to ‘eat’, or rather 
take the cell of, other objects. More specifically each object has an associated 
can_eat variable and the object is allowed to ‘eat’ and take the cell of any object 
whose ability at persisting is not larger than this can^eat value. The variable’s 
value is copied during reproductive events and is subject to the s-'me mutation 
rates as the persistence value and hence can evolve. With this added ability to 

^ To do otherwise would allow a net negative mutation rate to feed back on itself. This 
is a serious issue that must be addressed in more detail in future models. 
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evolve predation (figure 2b), there is no upper limit to the level of persistence 
that is attainable within this model. This is because a population of objects 
with persistence value X, who can eat objects of value X — I will be able to 
rapidly replace all negative mutations with ‘healthy’ objects of value X. So the 
persistence level can’t decrease below X and positive mutations will easily be 
able to take hold, thus allowing persistence levels to increase forever. 

For real organisms, who can eat who is much more complex than as modelled 
here. Hence predation in real evolution will not necessarily allow ever increasing 
levels of persistence. However, the point of this model is to show the way in 
which predation can remove the importance of the persistence error threshold. 

Putting them all together we can form one over arching picture of what 
has been achieved in these models. In this last model we assume that a certain 
minimum level of persistence is required to enable complex behaviours. When 
objects can persist for 15 or less time steps, then they can neither evolve their 
mutation rate nor their predation ability, they are too simple. When they exceed 
15 they can evolve their mutation rate. Once over the persistence level of 80 
they can finally evolve their ability at predation. Figure 2c) shows the resulting 
graph of the system’s dynamics. So together these models give some indication 
as to how early evolution was able to push back and then overcome the early 
persistence error threshold. Once predation has been evolved then the abilities 
to predate and/or to survive in a world of predators become more important 
factors in the evolutionary story than simply how long you could persist for on 
your own. The picture is summarised in figure 2d). 

4 Conclusions 

The minimum requirement on proto-life objects, that they can persist and repli- 
cate, already gives rise to a persistence error threshold that must be overcome. 
Replication rate and available space can bring a population closer to the theoret- 
ical threshold, but neither can change it. Evolvable mutation rates can make the 
threshold higher, whilst the ability to predate removes the threshold altogether. 
Therefore, predation must have been an important innovation in early evolution. 
Future work will analyse these models in more detail. 
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Abstract. An evolutionary core memory system is proposed which sim- 
ulates reactions between biological molecules in membrane bound com- 
partments in a solvent. In the imitation of biological substances, every 
word in the core is classified as a Membrane, Nutrition, DNA, or Protein; 
and the actions of the core proceed by metabolic reactions catalyzed by 
Proteins. The core is partitioned into small sections by Membranes. A 
creature in this space is represented by a region delimited by a pair of 
Membranes, and if it has an appropriate DNA sequence and initial Pro- 
teins, it can reproduce itself and dominate the core. In an experiment, a 
small ancestor creature that the author designed is inoculated and made 
to evolve in the core. Creatures able to replicate DNA more efficiently 
than that designed are successfully bred. 



1 Introduction 

Recently, Suzuki proposed a core memory system named ‘Semar’ (the sea of 
matter) which involves correspondences between core words and biological sub- 
stances [1,2]. He compared an operator (an instruction) to a protein in a cell, 
and using this analogy, prepared four types of data words on the analogy of 
biological molecules. They are Membrane (MEM), Nutrition (NUT), DNA (DNA), 
and Protein (PRO). A program (creature) in the core was delimited by the Mem- 
brane data, and made to evolve with genetic algorithms (GAs) according to the 
fitness values calculated from the excreted data. With this system, he succeeded 
in breeding programs able to solve a problem prepared in the environmental 
section; and yet this system was unsatisfactory in the following points. First, 
since GAs were used to evolve the programs, the creatures could not choose 
their own reproduction units by themselves. The program length (the number 
of instructions included in a creature) was fixed over a simulation run. Second, 
the actions of the core were driven not only by Proteins but also by DNA. This 
was evidently more complicated than a biological system, in which metabolic 
reactions are catalyzed only by enzymes (proteins). 

The aim of this paper is to remedy the above drawbacks of previous works 
and to present a more life-like system on the Semar core. All creatures are put 
into one core and instructions necessary for the replication and transcription 
of DNA are prepared. The actions of the creatures are accomplished only by 
Proteins and an outer loop using GAs is eliminated. 
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2 The Model 

Semar’s distinctive features are summarized as follows. 

- The core is addressed with logical addresses which are prepared apart from 
physical ones, and allows the insertion/deletion of arbitrary numbers of data 
words. 

- All data words are classifed into four groups: Membrane (MEM), Nutrition 
(NUT), DNA (DNA), and Protein (PRO). 

- The core is hierarchically partitioned by Membrane data, and almost all 
actions (computational operations) are confined to a section. 

- Ail actions in the core are activated by Proteins which are created by the 
transcription of DNA units. The proteins’ actions are parallel and indepen- 
dent of each other. 

Each word in the core is represented by a 64- to 192-bit string composed of 16-bit 
(lata units, called ‘header’, ‘type’, ‘label’, ‘value’ or ‘mcd’ (machine code), and 
‘temps' (templates) in turn. These units are denoted by mnemonics punctuated 
by colons. The mcd of Membrane data is ‘Bgn’ or ‘End’. The core range delimited 
by a pair of Membranes ( [MEM: Bgn] and [MEM: End]) constitutes a section. 

The actions of a Protein are a.s foliow's. First, the Protein searches for ligand 
Nutrition using the bit-matching between a Nutrition’s label and the Protein’s 
temp. Like ligand molecules wliich regulate allosteric proteins in a living cell, 
the selected ligand data work as an activator or inactivator of the Protein, and 
change the Protein’s type into ‘Actv’ or ‘Inac’. After this process, if the Protein’s 
type is ‘Actv’, the Protein is put into action. The protein begins seardiing for 
operand data using the bit-matching, and changes, copies, creates, moves, or 
deletes operands depending upon its function. In one clock (which measures 
time in the core), the Protein can modify all matched operands in the same 
section; and yet its influence does not exert beyond Membranes. A.Itliough all 
Proteins in the core are pracl.ically put into action one by one, the actions of 
the Proteins are accomplished logically in parallel using marks attached to the 
words. 

Table 1 shows 20 elementary instructions (machine codes, or genes) that the 
author has prepared. They are classified into three groups. The first group (RC_o 
and RC_e) consists of receptors for regulatory Proteins. DNA:RC_o and DNA:RC_e 
are not transcribed and serve only as starting or terminating signals for consec- 
utive DNA units wdrose type.s are regulated simultaneously. (In the imitation of 
biology, the author calls such a .set of DNA an ‘operon’.) The type of DNA is 
either ‘Utni’ or ‘Trns’. DNA can be transcribed to create a Protein only when its 
type is ‘Trns’. The second group (Prom and Repr) consists of regulatory Instruc- 
tions. Like regulatory genes in a living cell, when transcribed, DNA: Prom and 
DNA: Repr create PRO; Prom and PRO: Repr, respectively. These Proteins search 
for the matched DNA;RC_o or DNA:RC_e (operands) and change the type of the 
operons. The third group (creN to CHoP) consists of structural Instructions. Af- 
ter being transcribed, a structural Protein modifies operand data according to 
the defined function. 
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Mnemonic 


Operand 


Function as a Protein 


RC.o 




(Not transcribed to a Protein.) 


RC_e 




(Not transcribed to a Protein.) 


Prom 


DNA : RC.o 


Changes the type of operons into ‘Trns’. 


Repr 


DNA : RC_e 


Changes the type of operons into ‘Utrn’. 


creN 


HEM:Bgn 


Creates NUT after MEM:Bgn. 


creS 


MEH:Bgn 


Creates a new inner section after MEM:Bgn. 


prey 


HEHiBgn 


Converts (Bgn,-,End,Bgn) to (Bgn,Bgn,-,End). 


bred 


MEM: End 


Moves the operand inner section before MEMiBgn. 


invd 


MEM: End 


Converts (End,Bgn,-,End) to (Bgn,-,End,End). 


dgst 


MEM 


Deletes Bgn and End delimiting an inner section. 


ctbN 


NUT 


Deletes operand HUTs. 


CHiN 


NUT 


Copies operand NUTs to the inner section. 


CHoN 


NUT 


Copies operand NUTs to the outer section. 


trDP 


DNA 


Transcribes DNA to PRO : Inac after the sequence of DNAs. 


ctbD 


DNA 


Converts operand DNA to NUT with value zero. 


CHiD 


DNA 


Copies operand DNAs to the inner section. 


CHoD 


DNA 


Copies operand DNAs to the outer section. 


ctbP 


PRO 


Converts operand PRO to NUT with value zero. 


CHIP 


PRO 


Copies operand PROs to the inner section. 


CHoP 


PRO 


Copies operand PROs to the outer section. 



Table 1. The basic instruction set (gene set) 



The simulation of the Semar core is begun by inoculating the core with the 
ancestor creature designed by the author and a few words of environmental Nu- 
trition. The ancestor can reproduce itself every three clocks. After the start, the 
ancestor and its offsprings reproduce themselves and the size of the core grows 
swiftly. The passage of time in a run is measured in terms of the number of 
clocks, and at each clock, the following three operations are executed sequen- 
tially; the operations of Proteins, mutation, and weeding. Mutation is operated 
in an interval of time during which the size of the core grows twofold. When 
this happens, a bit is chosen randomly from among all of the core words (which 
include all significant data from the header to templates) and inverted. This 
process is repeated an appropriate number of times so that the occurrence rate 
of bit inversions per bit per clock might be equal to the constant value u. The 
weeding operation is executed only when the size of the core exceeds the prede- 
fined constant C'PTreapO- When this happens, the outermost sections (sections 
that are not included by other sections) are chosen and eliminated from the 
core until the total size of the core becomes smaller than the constant number 
C’Pr,eapl. 

3 Results of experiments 

Figure 1 shows a typical result of the simulation using u = 3.47 x 10“®, CPTreapO = 
24576, and CPT, .eapi = 19661. According to Fig. 1(a), the average amount of 
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(Average number of DNA units per outermost section)/! (X) 

\ (Rate of increase in the core size) 

\ \ (Rate (rf incre^ in the core size by the ancestor) 




1000 



clock 




Number of DNA units included in an outermost section 



Fig. 1. A typical result of the simulation, fa) The two thick and thin curves are the 
results of the simulation, and tlie horizontal straight line is the theoretical value for 
the ancestor — 1 — 0.26). (b) to (d) are histograms showing distributions for the 
amount of DNA included in an outermost section at each clock. 



DNA per outermost section begins from 20 (the amount of DNA included in the 
ancestor), increases, and reaches about 90 after 500 clocks. The rate of increase 
in the core size is also increased compared to the value of the ancestor; hence, 
we can conclude from this graph that the ability of self-reproducing creature.s 
is improved during the run. Figures l(b)~(d) suggest the mechanisms for this 
improvement. After 300 clocks, the original creature (which holds 20 DNA units) 
and its variants are completely lost from the core, and the core becomes dom- 
inated by creatures that hold 40 DNA units (twice as large as 20). A similar 
improvement hcippens after about 400 clocks. In the history of evolution, the 
amount of DNA in a section is doubled or quadrupled. 

To see this improvement more closely, several creatures were chosen dur- 
ing the run and bred in a blank core without mutation. The results of these 
experiments are shown in Table 2. The creatures chosen from the core at clock 
number 300/500 were able to bear two/four daughter sections every three clocks. 
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Chosen from 
the core at the 
clock number 


Number of 
DNA units 
included 


Initial growth 
of the section 
numbers 


Rate of 
replication 
of DNA units 
per clock 


0 (ancestor) 


20 


1,1,1,2,2,2,3,--- 


1^2 - 1 = 0.26 


300 


40 


1, 1,1, 3, 3, 3, 5,-- 


^ - 1 = 0.44 


500 


80 


1,1,1,5,5,5,9,--- 


i/5 - 1 = 0.71 



Table 2. Typical self-reproducing creatures chosen from the experiment shown in 
Fig. 1. The initial growth of the section numbers repre.sents the number of outermost 
sections at the first clock, that at the second clock, and so on. 



This multiple reproduction was achieved by the execution of plural numbers of 
[PRO; bred] s held in a section. 

The author finishes this section by describing the result of an observation 
to clarify the mechanisms causing the doubling of DNA sequences. In the ex- 
periment of Fig. 1, the first section with a doubled DNA sequence appears as 
follows. At 74 clocks, the mother section missed the breeding of the daughter 
and the daughter remained in the mother section after the breeding procedure. 
At 76 clocks, PR0:CHiD replicated DNA into this remaining daughter section 
and the DNA sequence was doubled. At the next clock, the daughter section 
was bred. At this example, the doubling of the DNA sequence occurred by an 
error in breeding the daughter section. 

4 Conclusion 

A novel ALife system called Semar was proposed and simulated. Using data 
words designed in the imitation of biological substances, biological reactions were 
emulated in the core, and the breeding of creatures able to replicate their DNA 
more efficiently than the ancestors was successfully carried out. The mechanisms 
causing the evolutionary improvement were also discu.ssed. 
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Abstract. The quasispecies theory is studied for dynamic replication 
landscapes. A meaningful asymptotic quasispecies is defined for periodic 
time dependencies. The quasispecies’ composition is constantly changing 
over the oscillation period. The error threshold moves towards the posi- 
tion of the time averaged landscape for high oscillation frequencies and 
follows the landscape closely for low oscillation frequencies. 



The quasispecies theory, put forward by Eigen in 1971 [1], and subsequently 
studied by Eigen, Schuster, McCaskill and coworkers [2, 3, 4], is nowadays one 
of the classical theories of self- replicating entities. Its prediction of an error 
threshold, above which the self-replication ceases to produce useful offspring, 
ha.s important implications for the origin of life. The error threshold effectively 
limits the amount of information the entities can carry, thus placing air up- 
per bound on the complexity self-reproducing information carriers can achieve 
without sophisticated error correction mechanisms. 

Although completely static environments are unrealistic in any case apart 
from experiments in perfectly controlled flow reactors, the quasispecies theory 
has so far been considered mainly in static replication landscapes. Neverthe- 
less, even under fixed environmental conditions can the replication rates of RNA 
molecules, for example, change because of changing concentrations of template 
and replica [5]. Jones [6, 7] has studied underlying time-dependencies which are 
identical for all sequences. Contrasting to that, we want to focus on replication 
landscapes with individual time-dependency for each sequence. One of the rea- 
sons for the neglect of individually changing replication coefficients in earlier 
work is probably the fact that for arbitrary temporal changes an asymptotic 
quasispecies cannot be defined. However, a meaningful definition is at hand for 
time-periodic replication landscapes, as we are going to show below. 

We start from the discretized form of Eigen’s evolution equation [8], lin- 
earized with the appropriate transformation [9, 10]. Due to space limitations, we 
cannot repeat the arguments leading to that equation here. For details about 
this calculation, the reader is referred to [4]. We use the same notations as are 
used there. Additionally, we define the error rate R ~ I ~ q, which gives the 
probability that a single symbol is copied erroneously. The string length wdll be 
denoted by I throughout this paper. 
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The vector of the urmormaiized sequence concentrations y{t) evolves accord- 
ing to 

y{t + At)^[AtW{t) + l]y{t). (1) 

Here, W(t) is the replication matrix W(t) = QA(t) - D(t). We assume the matrix 
W(t) is periodic with period T = nAt, n e N, with some chosen discretization 
time step At T. After iteration of Eq. (1), we obtain for t' = t + (At, C £ N 

y{t') = T I J] [AtW(t + yAt) + l] | y{t). (2) 

where T{.} stresses that the product has to be evaluated in the time order given 
by the iteration. With the definition of the matrix (or operator) 

X :=t| [zitW(t/Af) + l]| . (3) 

which maps y(0) onto y{T), we are now able to write down the solution of the 
discretized differential equation Eq. (1) for the initial condition y(0) as 

y(t) ^ |n W (i/Af) + l] I X'”y(0) . (4) 

where the time t has been subdivided into t = mT -i- (At, with ( < n and 
m,( € N. 

If we observe the system in time steps of the period length T, the system 
appears to evolve in a static replication landscape, which is defined by X. The 
asymptotic steady state for the oscillation phase C = 0 is therefore given by 
the normalized Perron eigenvector of A' [11]. For 0 < ^ < n, the steady 
states are found by application of T{. . . } from Eq. (4) to (f>Q and subsequent 
normalization. 

Let us now study quantitatively the effects a periodic replication landscape 
has on the prominent quasispecies. As the first step into that direction, we start 
from the Swetina-Schuster landscape [12] and introduce small oscillations in the 
master sequence’s replication coefficient Aq. For reasons of simplicity, we set all 
decay constants equal Di{t) = D, because then they drop out of Eq. (1) during 
the foregoing linearization. 

We will write the time-dependent replication coefficient Ao(f) as 

Ao(l) = Ao,s exp[e/(f)] , (5) 

where Ao.s is the replication coefficient in the static landscape, f{t) is a T- 
periodic function and c is the oscillation amplitude. For e = 0 the corresponding 
static landscape is reached. The other replication coefficients are equal A\ = 

■ ■.= A[ = A and constant. We will choose A so small that the condition A <C 
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Fig. 1. The steady state oscillations of the master sequence in cyclically changing 
environments with different oscillation periods T. Parameters used are I = 2, Ao,s = 
Ai{t) = A2{t) = I, t = 0.2. 

Ao{t) is satisfied for all t and e <C 1- This assures that we see a clear transition 
from the static case to the dynamic case, and additionally, that the changes in 
the master sequence’s abundance can be directly related to the changes in ylo(t)- 

One of the simplest forms the function /(f) in Eq, (5) can take on is 

/(f) = sin(o;f) with cj = 2-n/T. (6) 

In the following, we will shortly discuss the influences of different frequencies w 
and amplitudes e for this time dependency. 

As response to the oscillation of the replication coefficients, a modified os- 
cillation is found in the concentration xq of the Master sequence (Fig 1). For 
increasing frequency oj, the amplitude of the xq oscillation decreases and a phase 
shift strengthens. This behaviour is due to the finite time a reaction system as 
described by Eigen’s equation needs to settle into equilibrium. In constant envi- 
ronments, the asymptotic species distribution is approached in exponential time, 
with the relaxation time scale t set by the difference between the largest and the 
second-largest eigenvalue of W. For the oscillating environments the relaxation 
time needs to be compared to the period T. If T 3> r, the system is virtually in 
equilibrium for arbitrary (asymptotic) times f, whereas for T w r the changes 
cannot be tracked anymore and phase shift as well as amplitude damping of the 
response sets in. For T r the response amplitude gets fully damped and the 
system gets identical to one with the time-averaged replication coefficients. Inter- 
preting the Aq and xq time-dependence as input and output signal, the system 
acts as a low pass analog filter, in analogy to observations made in population 
genetics models with dynamic fitness landscapes [13, 14, 15, 16]. Moreover, for 
small e, the filter works linear, which means that a sinusoidal oscillation is found 
in xo, whereas this linearity is quickly destroyed for increasing e^. 

’ Details on this analysis can be found in [17]. 
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Fig. 2. The quasispecies distribution as a function of the error rate R. Ao,s = e^’*,A = 
1, € = 2, T 100. Two different oscillation phases are shown, left-. ^ — n/2, right: C = 0. 



We will now focus on the influence of a time-dependency as given in Eq. (5) 
onto the error-threshold. In accordance with the above, we have to distinguish 
between different dynamic regimes. For I’ ^ r, a sharp error-transition occurs 
at which denotes the error-threshold of a system with time-averaged Ao = 
y /q A(i{t)dt. Contrasting to that, a moving error-transition can be found for 
T ^ T approximately R*{t), which denotes for any given t the error-threshold 
in a constant landscape with = Ao{t). R*{t) lies between 
which correspond to Aq = maxt2lo(f) and Ao = mint Ao{t), respectively. In the 
intermediate cases T « r, the numerical simulations (see Fig. 2) show that the 
error-threshold R.*{t) oscillates within a smaller interval than ^maxl- 

These findings allow to draw a phase diagram as displayed in Fig. 3. For 
low T, -we observe the standard separation into an ordered phase (below the error 
threshold) and a disordered phase (above the error threshold). With increasing 
oscillation period 2\ a third, new phase appears between the two. In this phase, 
we observe — for a fixed error rate R — an alternation between a fully developed 




Pig. 3. Schematic phase diagram for a time- dependency like Eq. (5). 
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(juasispecies and a completely disordered system. The population seems to be 
moving back and forth over the error threshold. Therefore, we call this new phase 
the temporarily ordered phase. Since a similar phase diagram can be expected 
for any periodic landscape with finite 7?*^ and ^ -Rmax; believe that 
such observations could also be made in typical AL simulations such as Tierra or 
Avida [18] provided with the appropriate replication landscape. The temporarily 
ordered phase would for a finite population in a rugged landscape have the effect 
of causing a random drift over the landscape at some times and a localization 
around a local master sequence at other times. 

Upon completing this work, we became aware of Ref [19] , in which a different 
approach towards dynamic replication landscapes is given, using a stochastic 
time dependency in the landscape. The results presented there cannot directly 
be related to our findings here, because the equivalent to 7?*^ vanishes in [19], 
while the equivalents to 7?,^^,; and take on the same finite value. This leads 
to a different phase diagram than the one we observe here. 
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Abstract. A spatially resolved model of RNA world is studied, where 
primer induced replication, concatenation and random cutting are con- 
sidered. The increase of diversity of sequences and complexity of shapes 
are observed; A hierarchically orgnized “replication network” is formed, 
and evolution to long sequences and assembly of further long sequences 
are obtained through it. These results suggest a scenario for overcoming 
the error threshold and for the evolution of enzymatic activity. 



1 Introduction 

The RNA world is a hypothesis for the origin of life (see [1] for review) and 
several attempts to simulation it have been made(e.g.[2]). An important problem 
here is that the error threshold [3] [4] prevents long molecules to increase in the 
population. 

In the present paper, we will investigate this problem by simulation of an 
RNA-replicator model on a plane. The secondary structure is included. We ob- 
tain a replication network with high diversity and an increase of the variety 
of shapes. Then our model appears to suggest a way to pass the information 
threshold. 

Single-stranded RNA molecules c^m replicate by themselves (e.g. [5]), but the 
length is bounded by the error threshold. Then, we assume the concatenation of 
them is also possible, as is done in DNA [6]. 

Spatial resolution has several merits for the generation of diversity of se- 
quences. Especially, we believe the spatial clustering supports elongation and 
repair of molecules via the replication network (see §3.1). 

Recently, McCaskill et al. studied DNA/RNA amplification system (e.g. [7]). 
They have shown a formation of cooperative amplification network and reantion- 
diffusion like patterns on a plane. While this work is similar to ours, we con- 
centrate on RNA: we include RNA secondary structure and different kinds of 
phenomena are obtained, like assembly of molecules. 
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2 Model 

The model consists of two parts. One is the plane where RNA polymers exist, 
the other is the “soup” which supplies oligomers and monomers. 

The plane is descretized to a grid (typically 100 x 100) and each gridcell can 
be occupied by a molecule, which satisfies the minimum length limit (typically 
5). The molecules are assumed to be attached to the plane, but limited diffusion 
by Margolus’ method [8] does occur. 

The soup supplies oligomers (typicjJly, length 2 to 5) and monomers as sub- 
strates for the replication. The secondary structure of molecules is evaluated by 
the Vienna RNA package by Hofacker et. al [9]. 

We apply three reactions, replication, concatination and random cut. 

replication We assume the primer is required to start replication and the 
replication proceeds only from 5’ to 3’ ends. Dangling 5’ end is allowed only for 
the template (i.e. template do not elongate). Folded, double-stranded molecules 
cannot be replicated. Point mutations are applied on the part being replicated. 
The threshold value for the binding energy (typically -2.50 kcal/mol, averaging 
the number of bonds) is applied. When no matching primer molecule is found 
in the nearest neighbor, a random oligomer is taken from the soup. 

concatenation The molecules can be concatenated when a molecule pairs 
two molecules in its neighborhood to form a hemiduplex strand, bridging over 
two strands. The threshold for the binding energy is applied for both matching 
parts. 

random cut The single stranded areas in the molecules can be cut (typically 
l.Ox 10“® per nucleotide). The double-stranded region cannot be cut and a folded 
molecule is more stable than a single-stranded molecule. 



3 Results 

Both the population growth and the increase of diversity are frequently obtained. 
Although sometimes the plane will be ocuppied by trivial sequences (see §3.3), 
variety of sequences and complexity of secondary structures are obtained around 
30% of the simulations (see §3.4). 

To illustrate the behavior, let us see the case when the replication fidelity 
is 1.0. In fig. la, the sequences of frequent molecular species, whose populations 
are larger than 5, are shown. In fig.lb, some of the longest sequences are shown. 
There are about 3000 molecules after 100000 time steps from random initial 
condition where we start with 100 short molecules. The secondary structure is 
shown by the bracket representation [10]. 



3.1 Replication network 

In fig. la, some subsequences are shared by some of the freqent sequences. Such 
molecules may be orderd, as is in fig.2. In this figure, starting from the longest 
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Fig. 1. a: A list of frequent sequences from a snaphot of a simulation. On each line, 
a sequence, its secondary structure and its population size are shown, b: The longest 
sequences, from the same snapshot. Some of them have 2 forks. 



reverse-complementary sequence pairs (at the top), subsequences whose tail part 
are lost are ordered on either row, with their reverse-complementary sequences. 

In some pairs, the population sizes are asymmetric (e.g. there are 2 CCUC- 
CGGCCCG and 26 CGGGCCGGAGG). This implies an “elongation mechanis- 
m”. CCUCCGGCCCG is replicated 217 times and made by random cut 108 
times. 297 of them are used as primers for replicating longer sequences. Al- 
though this mechanism is limited to single-stranded molecules, we did obtain 
population growth of some long sequences by assembly (see §3.2). Note that the 
reverse-complimentary sequence, CGGGCCGGAGG cannot elongate, because 
no dangling 3’ end when it matches CCUCCGGCCCGCG. 

If we focus on the relationship between a long sequence and its subsequences 
with loss of tails, above elongation mechamism is regarded as a “repair mechamis- 
m”. While long sequences are more fragile than short sequences, this mechamism 
supports them and the diversity of the whole system is maintained. 

The replication network is an important driving force for the “evolution to 
complexity” . As described above, elongation and repair processes maintains the 
variety of sequences and long sequences are formed and maintained. While we 
have not investigated spatial pattern formation, we think spatial clustering sup- 
ports above elongation and repair. Since the reaction is local, the formation 
of cluster is trivial. Then, if a molecule elongate, its reverse-complementary 
sequence is expected to found in the neighborhood and whole cluster can be 
elongated. Also, the repair can be done in a similar manner. 

3.2 Assembly 

The long molecules (more than 20) usually have secondary structure and they 
cannot replicate but are formed via concatenation. However, in fig.la, some of 
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Fig. 2. An example of the replication network. Each box shows reverse-complementary 
pair. Prom the longest pair (the box at the top left), their subsequences are shown on 
the either row (separated by the line) , orderd by the length. The number on the left 
of the sequence is the population size. 



folded sequences axe reproduced. Such long sequences are “assembled” , by fre- 
quent sequences. For example, CCUCCGGCCCGCGGGCCGGAGG is a com- 
bination of CCUCCGGCCCGC and GGGCCGGAGG. Further elongation is 
possible, if there are dangling ends. Thus we may hope a step-by-step extention 
of sequences. 

In fig. lb, some of the longest sequences are shown. Some molecules have 
multiple forks. About 30% of simulations have such complex secondary struc- 
tures. The formation of long, multi-fork molecules can lead to the emergence of 
“ribozymes”. Indeed, we have obtained some interesting molecules, which are 
about 100 nucleotides long, and three or more forks. Also, another merit of fold- 
ed molecule is stability, since the double-stranded regions do not break by the 
random cut. 



3.3 Parasites 

When the replication fidelity is high, parasitic sequences may take over. Trivial 
sequences, such as all-C or all-G, can easily replicate and concatnate because 
of high probability for the ligation of a primer. Once some aJl-C and all-G se- 
quences meet and start replication, their populations grow rapidly. Such trivial 
sequences do not allow the formation of “ribozymes” with complicated secondary 
structures. 



3.4 The Effects of Mutation 

We have calculated the average ratio of parasitic invasion, maximum sequence 
length and maximum population size. See table. 1. Those values are averaged over 
100 samples. The classification is evaluated as follows. A: more than half out of 
10 longest molecules have multiple forks. B; secondary structures are less seen, 
but not invaded by the parasites (i.e. quickly growing all-G /all-C sequences). C: 
invaded by parasites. 

On the right side of the table, averaging only A-class runs, total popula- 
tion size (pop), maximum length (Lmax), maximum population size restrincting 
length > 10 (Pmaa;!)) maximum population size without restriction {Pmax'2) are 
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^•max 
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1.0 
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64.3 
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23.1 
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35 61 4 
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55.4 
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15.3 


0.80 


27 63 10 


2544.3 


48.6 


2.2 


9.9 


0.70 


19 68 13 


2384.5 


43.8 


2.0 


8.8 



Table 1. Classification of simulation results and statistics. See text. 



shown. The averaged maximum length and population size decrease as the fideli- 
ty decreases; this is a trivial result of mutation. Also, the mutation suppresses 
the parasitic invasion, by giving a secondary structure to mutants of all-C and 
all-G sequences. The percentage of A-class case have an optimum when fidelity 
is 0.90, due to mutatants of parasites, which have several forks. However, the 
maximum population is a few at this point. Both rich diversity and population 
growth of a molecule are obtained around 0.95 fidelity. 

4 Summary 

The spatial clustering supports the complexity of replication network, where 
elongation and repair of sequences occur. Within the replication network, fre- 
quent sequences increase the population of assembled long sequences where the 
complexity increases further, beyond the boundary of error threshold. Then the 
enzymatic activity may expected to arise. We will study the transition to repli- 
case dominated dynamics in the near future. 

One of the authors (TY) is supported by the fellowship from Japan Society 
for the Promotion of Sciences. 
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Abstract. Divergence in antigen response of the immune network is 
discussed, based on shape-space modelling. The present model extends 
the shape-space model by introducing the evolution of specificity of id- 
iotypes. When the amount of external antigen increases, stability of the 
immune network changes and the network responds to the antigen. It 
is shown that specific and non-specific responses emerge as a function 
of antigen levels. A specific response is observed with a fixed point at- 
tractor, and a non-specific response is observed with a long-lived chaotic 
transient state of the lymphocyte population dynamics. The network 
topology also changes between these two states. The relevance of such a 
long-lived transient state is discussed with respect to immune function. 



1 Introduction 



The ‘lock and key’ concept has been central to understanding the specificity 
of biochemiced molecular interactions, from enzyme-substrate relationships to 
antigen-antibody matchings. However, it has gradually been realized that such 
a ‘lock and key’ concept is not strictly valid, particularly in immune systems [1]. 
Antigen-antibody interactions are found to be plastic or ‘ multispecific’ rather 
than fixed or single-specific. Namely, antibodies inherently have a flexible recog- 
nition capcicity. Kearney et al.[2] have confirmed experimentally the existence of 
such ambiguity of recognition in the antibody binding site of immature B cells. 
It is generally believed that development from ambiguous to specific recogni- 
tion is caused by somatic hypermutations [3]. We here propose a new dynamics 
of specificity evolutions based on Jerne’s network hypothesis [4j. Our model is 
characterized by a meta-dynamics of idiotype specificity on shape-space [5]. We 
show here that specific and non-specific responses to an antigen are governed 
dynamically by a fixed point attractor and a chaotic long-lived transient state 
of an immune network, respectively. The relevance of such a long-lived transient 
state is discussed with respect to immune function. 
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2 Modeling with a Meta-dynamics of Specificity 

We first introduce the standard idiotypic network model. Each idiotype is char- 
acterized by a pair of surface sites, called the idiotope and the paratope. If the 
idiotope site of a lymph cell is bounded by paratopes of other lymph cells, the 
recognized lymph cells become inactivated, whereas the recognizing cells become 
activated. Thus the growth dynamics of clone size of an idiotype of paratope 

k and idiotope j is given as. 



-n+l 






dXkj + s, (1) 



The idiotope- paratope interaction bij is assumed to have an exponential 
form; We characterize the ambiguity of the antigen-antibody by the 

deviation parameter a. The proposed meta-dynamics controls this parameter. 
First, as a simple example, we quantize a by the power of 2; cTm = 2^“™. The 
maximum specificity is given by m = M. 

Now each idiotype is characterized by three variables: idiotope fc, paratope 
j, and the specificity m. We thus describe the evolution of specificity as follows: 



~ (1 A* )Xk,j,m + M /2 + ^m=l) (2) 

where n' is the mutation rate of specificity. Here the source term Sm=i is 
added for the least specific antibody. This dependency reflects the fact that the 
premature B-cells are believed to have lower specificities. 

By combining these equations, we establish the complete clone growth dy- 
namics with mutations among idiotypes and the evolution of the specificities. 

In our model, there are five diflterent types of idiotopes and of paratopes, so 
that there are 25 different idiotypes, with M — 5 different levels of specificity. 
The rest of the system pmameters (i.e. /r' = 0.3, s = 1.0 , d = 0.1, and a = 2.0) 
are selected so that the size of each clone never diverges. 

The following results (especially, the natural tolerance at high amount of 
antigen) are confirmed not to depend on the values of the mutation rate fi' and 
the source s. The dependency of system size is still unclear. 



3 Dynamical Natures of the Network 

We pay most attention to how the idiotype network responds to persistent anti- 
genic stimulations. A static antigen with a binding site k is introduced by adding 
the constant term + Akbk,jXij^^ to the above equation. Estimating the mean net- 
work specificity Spk by averaging the specificity of ail idiotypes bearing paratope 
type ic, we study the antigenic effect on the network dynamics. 

An antigen of type 4 is used as an example, but the following result does not 
depend on the selected antigen type. Because we adopt the periodic boundary 
condition for the shape-space, each idiotype is equivalent within a network. 
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amount of antigen amount of antigen 




Fig. 1. The average network specificity and the maximum Lyapunov exponents are 
plotted against the antigen level in Fig. 1(a) and (b) respectively, and in Fig. 1(c), the 
stability of the type 1 attractor is plotted against the antigen level. Stability is measured 
by the proportion of idiotypes in the initial distribution that show a transition from a 
type I to a type II state before a given time step. The time intervals used are 10,000, 
50,000, and 100,000 steps. The number of initial distribution sets is 100. 



We show a plot of the averaged specificity (Spi) and the maximum Lyapunov 
exponent under the antigen stimulations over 10^ steps (see Fig. 1(a), (b)). 

In Fig. 1(a), as expected, the network specificity increases when we increase 
the amount of antigen. At about 9.5 units of antigen, however, the specificity 
abruptly diverges to a high value. We say that a specific response has occurred 
at this antigen level. This specific response is observed until the antigen level 
reaches 13.5 units. Beyond this critical value, the specific response is no longer 
observed. Inversely, the specificity is sustained at the lower values. This lower 
sustained response can be compared to natural tolerance to the antigen. 

On the other hand, by comparing Fig.l(a) with (b), when the amount of the 
antigen is set between 9.5 and 13.5 units, we notice that the lower specificity 
emerges with chaotic dynamics, and the higher specificity emerges with a fixed 
point dynamics. We shcdl call the former dynamics a type I attractor and the 
latter a type II attractor. 

However, the type I attractor is not a true attractor. It was found to be 
a long-lived transient state referred to as a super-transient state, which is a 
common phenomenon in high-dimensional dynamical systems [6]. In Fig.l(c), 
when the observation period is extended, we observe a transition from type I to 
type II attractor. There is no inverse-transition from the type II to the type I 
attractor. The super-transient states are highly dependent on the antigen level. 
For example, when the antigen level is 11.5 units in Fig.l(c), the transition 
probability from type I to type II is still less than 12 percent. In such cases, it 
behaves as an attractor in a practical sense. 

From a practical viewpoint, response time is also worth noticing. If we say 
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Fig. 2. Topology of an idiotype network of type II and I attractors is shown in Fig.2 
(a) and (b), respectively, for an antigen level of 12 units. Only idiotypes that have a 
population of 1.25 on average of over 10,000 time steps are depicted. Each idiotype in 
this figure is represented by a pair of symbols, square and circle. The square with the 
numeral inside denotes the paratope, whereas the circle with the numeral inside denotes 
the idiotope. The triangle with the number inside represents the injected antigen type. 
A stimulation wave from idiotope to paratope is shown by a dotted line. 



that the relevant time scale for the immune response should be less than 10,000 
time steps, in a practical sense there is no specific response even at higher levels 
of antigen (see Fig.l(c)). Our results suggest that a certain level of antigen causes 
the super-transient state to suppress fast immune responses under the idiotype 
network. 

Besides the response time, much attention has been paid in the field of the- 
oretical immunology to topological changes of the network [7, 8, 9]. Here we 
argue that the transition from type I (unspecific) to type II (specific) causes a 
simultaneous change of network topology. The network topology of each of these 
two states is shown in Fig.2. 

As we see from the figure, a chaotic super-transient state of type I has a 
more complex network than does type II. Inversely, higher specificity to the 
dosed antigen is maintained by a simpler network structure. The maintenance 
of idiotypic diversity can be attributed to chaotic dynamics. 

By estimating the amount of specificity of all idiotypes in the type Fs dis- 
tributed state and the type II’s localized state respectively, it is found that 
each idiotype in type Fs distributed state has a low specificity on the whole. 
Namely, each idiotype interacts weakly with many idiotypes in order to have 
high connectivity. As a result, the stimulation of the network by dosed antigen 
is distributed over the network, not concentrated only on idiotypes bearing a 
binding site (paratope with type 4) for the antigen. Thus, the immune response 
to the antigen has a tendency to be suppressed. This result would support Stew- 
art’s extrapolation that ‘The higher connectivity among idiotypes, the greater 
the degree of tolerance’ [10]. 
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Recently, a chaotic oscillation was found experimentally in a natural tolerant 
state. Subsequently, theoretical immunologists have tried to establish ‘natural 
tolerance under chaotic dynamics’ against a static antigen [7, 8], though their 
simulation results show difficulty establishing such a tolerance without assuming 
a special network topology of an ‘odd-loop structure’ and so-called ‘bell-shaped 
function’ as an activation function. We have shown how such a tolerance can arise 
naturally under a chaotic dynamics, without these assumptions, by adding an 
additional flexibility; i.e., meta-mutation dynamics with specificity of idiotype. 
We have used a simple idiotypic network model, and have not ventured to use the 
more complex ’ bell-shaped function model’ because of focusing on capabilities of 
the meta-dynamics we introduced. Applying the meta-dynamics we introduced 
here with the bell-shaped function model is left as a future problem. 

4 Concluding Remarks 

In this paper, we have expanded the possibilities of theoretical immunology by 
introducing new meta-dynamics. We believe that the immune response should 
be seen as having a more dynamic nature than allowed by most current models 
[11], and that the specific antigen-response and the dynamical percolation related 
to natural tolerance are caused by the meta-dynamics controlling the degree of 
specificity, as introduced here. 
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Abstract. Natural collective systems, such as social insects, provide us 
with an existence proof that remarkable feats of construction can be 
achieved by ’simple’ agents. Such feats appear to demand impressive 
control and coordination which is even more remarkable since the agents 
are not provided with the overall ’blue-print’ for construction. However, 
as a consequence of the agents Ccirrying out simple rules, an emergent 
macroscopic structure can develop. In an attempt to understand some 
of the underlying principles, this paper deals with pattern formations 
built by a swarm of mobile agents in a lattice. In particular, the morpho- 
logical classification of the formed structures is provided and classes of 
simple, complex and non-trivially ordered structures are characterised. 
For these experiments, all agents start their evolution at the same site in 
the lattice. This investigation extends the ideas of [1] aind [18] by giving 
agents simple rules, based on neighbourhood characteristics, which gov- 
ern whether they move or become static. The final outcomes, defined by 
the immobility of all agents, is studied and the global static structures 
created are presented cind discussed. Since the rules are parameterised, 
the paper reports on the selection of rule types to generate classes of 
ordered structures. 



1 A problem and the ideology 

Structure formation together with self-organization processes have become topics 
of great importance in many different scientific fields such as chemistry, material 
science, plasma physics, hydrodynamics and particularly in biological sciences 
see e.g. [20], [4], [16], [17], [12], [9] and [19]). In all these fields the focus of 
the investigations was in explaining such phenomena as how a given set of pat- 
terns in certain systems could be generated and particularly how the underlying 
developmental mechanisms functioned. 

The recently emerged fields of nanotechnology and nanorobotics (albeit in 
their infancy) (see e.g. [8], [15] and [14]), collective robotics [18], distributed build- 
ing and stigmergic behaviour ([10], [5], [21], [7], and [6]), and fabrication of smart 

* Some parts of this work were supported by Hewlett Packard Laboratories, Bristol, 
under the External Research Programme. 
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matter [13] also employ pattern formation as a key nnderpinning mechanism with 
the emphasis on its pragmatic realisation. The problem can be summarised as 
finding the correspondence between domains of the parametric space which de- 
termine the formation of patterns with desirable properties for a given set of 
discrete entities with a definite set of local parameters 

This paper attempts to bring together different strands of previous work such 
as the distributed spatial sorting of physical objects in robot collectives [18], 
interval parameterization of the spatio-temporal excitation dynamic in multi- 
dimensional excitable media and lattice swarm ([1]); complexity of the automata 
models of morphogenesis [2]; and designs of swarm based algorithms [3] and 
employ them in the specific problem of pattern formation in lattice swarms. 

The experimental domain is constrained to the following problem: 

Consider a set of minimalist agents which can move randomly over a 2D 
lattice {mobility). The agents do not interact with each other but can sense 
the presence or absence of agents at the nodes immediately around its current 
position. The agents move according to simple rules and finally stop moving 
and take up a resultant configuration. The problem is to identify and classify 
these possible final configurations in terms of order. The rules governing mobility 
are based on two parameters, firstly, the frequency of checking the immediate 
neighbourhood nodes for the presence or absence of agents (degree of activity) 
and the number of occupied nodes (sensitivity) . 

The model employed the following features. Firstly, agents only take account 
of the number of immediately adjacent nodes which are occupied by one or more 
agents. Secondly, agents can only move one step in a random direction. This is 
equivalent to a high degree of noise. Thirdly, the size of the final distribution 
of static agents is polynomially bounded by the number of agents. Fourthly, all 
trials start with agents injected into one node. This is considered important since 
the potential implementation of such an approach with real robots may involve 
small machines being injected at a point source. 



2 The model 

Let I be a set of m uniform agents which move at random on the 2D lattice L 
in discrete time. At every time step automaton i can move or not move to one 
of the eight nodes of n(c[) neighbouring to its node cj. The agent moves if there 
is another agent at the same node or if the number of agents in u(c\) lies in 
some specified interval. At the beginning of the trial all agents are at the same 
node: c*: Vi G I : c? = c* G L. The agents do not interact with each other, so 
every agent simply senses the presence or absence of one or more agents for each 
of the neighbouring positions and for its current node, i.e. for the nodes c[ and 
u(c[) = {n G L : |v - c^l = 1}. 

An agent moves if, after checking its neighbourhood, it finds that its node is 
occupied by another agent or if the number of the occupied neighbouring nodes 
belongs to some specified interval \ < 0i < 62 < %■ In some versions of 
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the model, agents do not check their neighbourhood regularly. In this case it will 
not move even if its own node is occupied by one or more other agents. 

When agent i at node c\ decides to change its position it walks to a randomly 
chosen neighbouring node. It does so if: (i) it decides to check its neighbonrhood, 
and one of the following situations takes place: (ii) there is another automaton 
j ^ i at the same node, c\ = c*-, (iii) the number of nonempty elements of u(c*) 
lies in the interval [^i, ^ 2 ]- 

We represent nodes of L by their integer coordinates and use additional 
notations a = |{j G I : c* = cj}|, = \{j £ I : c^- G w(c-)}|, d is the random 

variable from { — 1,0, 1}^, and the function 1 / : Z {0, 1}, such that v{x) = 1 if 
a: > 0 and it equals 0 otherwise. Also we have <^(a, j3, 9i , 62 ) = 1 if t'(a) V i/{0 — 
9i) A 1/(02 — /3)i and it equals 0 otherwise. 

The basic equation of motion for agent i therefore is: 

=c‘ + d-C(a,/?,0i,02) (1) 

In this rule all agents act simultaneously. They are active for every step of 
their evolution. We call the model with agents obeying the equation (1) the S- 
swarm, or 6usy swarm. In contrast, for some trials an agent will check the number 
of occupied nodes in its neighbourhood every mth step of the simulation time. 
These agents are considered to be members of lazy swarm (F-swarm). An agent 
of S-swarm updates its positions as follows: 

= c- + d •C(a,/?,01,02) • - I* - (< mod m+ 1)1) (2) 

3 Convergence 

For swarms executing equations (1) and (2) a condition is always reached when 
there is no further movement of any agent. This is referred to as a stationary 
global state. 

The analyse of the activity patterns of swarms (see, e.g.. Figure 1), which is 
measured in the number of agents which change their positions at current time 
step, convinces us that lower and upper bounds of the intervals significantly 
influence the rate of convergence to the stationary global state. 

Hereinafter we use the term pattern which refers to finite sub-lattice of L. 
Agents in this sublattice are distributed in such a manner that every node of 
L takes the value 1 if there is an agent at the node, and value 0 if there is no 
agent at a node. The pattern is determined after the swarm converges to the 
stationary global state. 

4 Morphological classification 

The set of all activation functions can be subdivided into several classes based on 
the morphological characteristics of the patterns. The functions are in the same 
class if, when applied to the agents of the swarm the same pattern is formed. 
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Fig. 1. Examples of activity of agents in the evolutions of swarms from different classes: 
number of active agents vs. time, m = 1000. 



The sameness is determined visually and it is verified with density distributions 

of agents and an w-order measure. 

The following seven classes (Figure 2) are identified in both P- and 5-swarms: 

C-class; Condensed patterns. Almost every agent has neighbourhood fully packed 
with agents. The patterns are solid-like. 

S-class: Sparse patterns. Every agent has empty neighbourhood. The patterns are 
liquid like. 

Jf-class: Halo patterns with condensed core. Agents with fully packed neighbourhood 
are still in majority but certain amount of the outsiders with less crowded 
neighbourhood is already visible. The patterns recall us the melting ice; so, 
we can think on them as two-phase, transitional, structures. 

T-clciss: Porous patterns. They look like the grids. 

C-class: Labyrinthine patterns. They are classical (compare with diffusion, mag- 
netic domains, chemicals on the surface of catalyser etc.). Corridors of the 
labyrinths are usually short. There are many turns and free standing walls. 

J-class: Fingerprint like patterns. The patterns are subdivided onto the few large 
domains with the vertical or horizontal polarization of the walls in labyrinth. 
Usually there are not so many passes between the domains. 

)j-class: Porous patterns with large holes inside and some halo outside. They vary in 
appearances but rneristema-like structure is common for all of them. 



As one can see from the examples, the classification is not perfect and some 
ambiguity still remains. Correspondence between the classes and the intervals 
[(?i, 62 ] of agent activation is shown in Figure 3. 

5 Lazy does better: u;-order 

Labyrinthine patterns represent the perfect example of the highly ordered struc- 
tures in the systems under consideration. In the ideal case (Figure 4) the density 
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Fig. 2. Representative stationary configurations for the morphological classes of lattice 
swarm, m = 1000. 
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Fig. 3. Interval cl 2 issification of F-swarm 



distribution shows us that the majority of the agents have two agents in their 
neighbourhoods. These two agents are in the same column or the same row of L. 
This gives us an idea to use the following measure of an order: given pattern P 
generated with activation interval [^i, ^ 2 ]i the so-called wall order is calculated as 
u) = maxfw^.iuy}, where uv and ujf denote the number of local configurations 
as shown below: 
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Fig. 4. Density distributions D = (D,)o<i<g, Di - |{j S I ; ({a € I : |sz — sjj = 1}| = 
!}|, in the stationary patterns for the morphological classes of lattice swarm, m = 1000. 



For any pattern P formed by m agents 0 < w < m — 2. The measure takes 
its maximum on the line of agents (where agents are arranged along the column 
or the same row); and it takes its minimum on either densely packed or sparse 
patterns. Therefore it is concluded that 



S-swarms produce more ordered patterns than P-swarms. 
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It is because u> takes its maximal value on the patterns generated in functions 
from 9"-class. All other classes have sufficiently less values of the w-order. J-class 
was not found amongst the members of P-swarm. Actually the equations (2) 
and (1) leave some space for the possible criticism. To verify the features of the 
lazy swarm we show how tuning the laziness of agents will change the order of 
the structures generated by these agents. Let us look at the following rule: 

c‘+i + (3) 

where p is a random variable that takes a value of 1 with probability ( < Pp < 1 , 
e is very small, and it takes value of 0 with probability I — pp. The more pp 
the more agents decide to check their neighbourhood and, possibly, change their 
positions at each time step. We can say that agents have random independent 
delays of switching. There is, therefore, a probabilistic continuum between S- 
swarm and P-swarm, where lazy swarm is at one end of the continuum and busy 
swarm is at the other end. 




Fig. 5. Order ^ vs. degree of activity pp, see rule 3, computed for the swarm of 
m = 1000 agents. 



Experiments employing equation (3) (see e.g. Figure 5) show that 

Lazy swarms generate more order than busy swarms. 

Figure 5 shows that the order of the generated structures is inversely pro- 
portional to the degree of agent activity. The remainder of the paper will only 
deal with 5-swarm. 

6 Long transients and attractors of order 

Size, convergence time and order are the essential parameters, which more or 
less, successfully characterize spatio-temporal systems. They are attributes of 
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the swarm in general. Sensitivity, in contrast, is an agent based, local, property. 
How do they relate to each other? 

A 8 X 8 node "sensitivity” lattice is indexed by {di,02) pairs. We wish to 
investigate how changing the [^1,^2] interval will affect the global stationary 
state. 

The convergence time, tc : for t > tc for any agent i we have = c\, 
is chosen as the temporal attribute. The radius of a pattern of agents, r = 
maxjgi jc* — c*'j, is the size attribute. And, as in the previous section, w( ) gives 
us the measure of order. In computer experiments with the swarms from 1 to 
1000 agents we show that r(m) is reasonably well approximated by 7r = 

7r(^i,^2); Itm, 7r = lr{0\,02), IS the approximation for tc(m). So, patterns 
built by the agents with [^i , 62] intervals of sensitivity are characterised by three 
dimensionless values: 7t(^i,^2), 7r(^i,^^2) and 
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Fig. 6. Time, size and order fields constructed in computer experiments with S-swarm. 



Using these values we construct vector-like field where the unit vector at node 
2; points toward the neighbouring node y, w'hich takes the maximal value in the 
closest neighbourhood of x. From Figure 6 we see that (i) swarms with wide 
range and low threshold of sensitivity have longest convergence time, (ii) swarms 
with narrow range and high threshold of sensitivity produce the most ordered 
patterns. 

In terms of a dynamical system, the vector fields constructed over the sensi- 
tivity lattices show the following: (i) both S and 3" classes are the attractors in 
the time field; Figure 6, A, shows the attractors at nodes [1,4] and [6, 7], (ii) the 
S class is an attractor for a size field (^j = 1 and 4 < ^2 < 8; Figure 6, B), 
(iii) the 3* class is an attractor in an order field. Figure 6, C, shows attracting 
nodes [5,7] and [6,8]. 

Assuming that cj-ordered patterns are desirable results of the pattern forma- 
tion in swarm systems, and calling agents with high threshold of excitation by 
rough agents, we have the following proposition 

Rough and lazy agents are better than sensitive and busy agents. 

All our results can be easily applied to 3D swarms. We do not have the space 
to discuss a 3D models. However, we have provided one example of the 3D 
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Fig. 7. Stationary pattern of 1000 agents in 3D lattice. In the evolution every agent 
decides to move if number of other agents in its 3 x 3 x 3 node neighbourhood belongs 
to intervcil [20, 25]. 1/8 of cluster is removed to show the internal structure. 



pattern (Figure 7). The interior of the cluster is separated from the surrounding 
space and is highly ordered inside. 



7 Conclusion 



A mapping between simple rule sets and emergent global structures have been 
investigated and presented. The study has attempted to answer some of the 
questions pertinent to the choice of which parameters of the rules, executed by 
swarm agents, should be chosen to control the morphology of the emergent global 
structure. Two basic parameters have been taken into account to parameterise 
the abilities of swarms to build complex and ordered structures. The first one 
is the degree of sensitivity measured in the precise amount of occupancy of 
the neighbourhood of an agent. The second one is the degree of activity that 
determines how often an agent analyses the state of its vicinity. 

In the results of computer experiments we (i) computed the complete map- 
ping between agent parametric space and morphological classes of the formed 
patterns, (ii) made a correspondence between local agent properties, dynamical 
features of the swarm systems and morphology, and (iii) show how the ordered 
non-trivial structures emerge in the families of mobile swarms. 

This study highlights the possibility of employing such emergent principles 
to constructions varying in scale from the very small to the very large. 




441 



References 

1. Adamatzky A. and Holland O. Edges and computation in excitable media, In; Proc. 
ALIFE IV: 6th Int. Conf. on Artificial Life (MIT Press, 1998) 379-383. 

2. Adamatzky A. I. Simulation of inflorescence growth in cellular automata. Chaos, 
Sotitons and Fractals 7 (1996) 1065-1094. 

3. Adamatzky A. and Holland O. Voronoi-like nondeterministic partition of a lattice 
by collectives of finite automata, Mathl. Comput. Modelling 28 (1998) 73-93. 

4. Babloyantz A. Molecules, Dynamics and Life (New York; Wiley, 1986). 

5. Bonabeau E., Theraulaz G., Deneubourg J.-L., Franks N.R., Rafelsberger O., Joly 
J.L., Blanco S. A model for the emergence of pillars, walls and royal chambers in 
termite nests Philosophical Trans. Royal Soc. London Ser. B - Biol. Sci. 353 (1998) 
1561-1576. 

6. Bonabeau E. From classical models of morphogenesis to agent-based models of pat- 
tern formation. Artificial Life 3 (1997) 191-211. 

7. Camazine S. Self-organizing pattern formation on the combs of honey bee colonies, 
Behav. Ecol. Sociobiol. 28 (1991) 61-76. 

8. Drexler E. Nanosystems: Molecular Machinery, Manufacturing, and Computation 
(Wiley; 1992). 

9. Ermentrout B., Campbell J. and Oster G. A model for shell patterns based on neural 
activity, Veliger 28 (1986) 369-388. 

10. Franks N.R., Wilby A., Silverman B.W. and Tofts C. Self-organizing nest construc- 
tion in ants; sophisticated building by blind bulldozing, Animal Behaviour 44 (1992) 
357-375. 

11. Grasse P.-P. La reconstruction du nid et les coordinations interindividuelles chez 
Belicositermes natalensis et Cubite. La theorie de le stigmergie; essai d’interpretation 
du comportement des termites constructeurs Insectes Sociaux 4 (1959) 41-84. 

12. Swinney H.L. and Krinsky V.I. (Eds.) Waves and Patterns in Chemical and Bio- 
logical Media (North-Holland, 1991). 

13. Hogg T. and Huberman B.A. Controlling smcirt matter Smart Materials and Struc- 
tures 7 (1998) 1-14. 

14. Holland O. and Melhuish C. Getting the most from the least; lessons for the 
nanoscale from minimal mobile agents Proc. Artificial Life V (Nara, Japan, 1996) 
59-66. 

15. Lewis M. A. and Bekey G. A. The behavioural self-organization of nanorobots 
using local rules Proc. 1992 IEEE RSJ Intern. Conf. Intelligent Robots and Systems 
(Raleigh, NC, 1992) 1333-1338. 

16. Markus M., Muller S.C. and Nicolis G. (Eds.) From Chemical to Biological Orga- 
nization (Berlin; Springer, 1988). 

17. Meinhardt H. Models of Biological Pattern Formation (New York; Academic Press, 
1982). 

18. Melhuish C.R., Holland O.E. and Hoddel S.E.J. Collective sorting and segregation 
in robots with minimal sensing Proc. 5th Intern. Conf. on Simulation of Adaptive 
Behaviour (Zurich, 1998) 465-470. 

19. Murray J.D. Mathematical Biology (Springer, 1989). 

20. Nicolis G. and Prigogine 1. Self- Organization in Non- Equilibrium Systems (New 
Yorsk; Wiley, 1977). 

21. Theraulaz G. and Bonabeau E. Coordination in distributed building Science 269 
(1995) 686-688. 

22. Toffoli T. and Margolus N. Programmable matter Physica D 47 (1991) 263-272. 




Self-Repairing Multicellular Hardware: A Reliability 

Analysis 



Cesar Ortega* and Andy Tyrrell 



Department of Electronics 
University of York 
York, YO10 5DD, UK 
Icesar, amt}@ohm.york.ac.uk 



Abstract. Artificial Life explores the characteristics of living organisms to 
understand not only life as it is, but also as it could be. Embryonics is a 
proposal for a new generation of fault-tolerant Field-Programmable 
Multicellular Arrays inspired by nature and appropriate for Artificial Life 
research. Embryonic arrays use hardware redundancy and array reconfiguration 
mechanisms to achieve fault tolerance. In this paper the k-out-of-m reliability 
model is used to analyse the reconfiguration strategies used in embryonic 
arrays. Two schemes are investigated: row- or column-elimination and cell- 
elimination. The models proposed can also be used to analyse the reliability of 
systems with spares other than embryonic arrays. 



Introduction 

Artificial Life pursues a twofold objective: on one side it investigates the properties 
of natural life by studying emergent behaviours in computer simulations and artificial 
creatures; and also applies the mechanisms that sustain life to the design of practical 
solutions in engineering. In this way, Artificial Life studies not only life as it is, but 
also life as it could be. 

The systematic study of artificial cellular systems, like cellular automata, neural 
networks or processor arrays, has gained momentum during the past few years in 
Artificial Life studies. The goal is to understand the emergent behaviours observed in 
natural cellular systems. To borrow the main principles sustaining these mechanisms 
and applying them to the design of electronic systems could re.sult in a new approach 
for the design of fault- tolerant systems [1]. 

In hardware redundancy physical spare components are used to replace the faulty 
ones. Most hardware redundancy reconfiguration techniques rely on a central 
processor performing the diagnosis of the cells and executing the algorithms to 
reconfigure the array in case of failure [2]. An alternative approach inspired by 
multicellular organisms is to distribute the tasks of diagnosis and reconfiguration 
among all the cells in the array, eliminating the single point of failure and added 
complexity introduced by the central processor. A reliability model for such 
architectures is presented in this paper. 



' Sponsored by Mexico’s Government under grants CONACYT-! 1 1 183 and llE-961 1310226. 
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Embryonics 

Embryonics introduces a new family of fault-tolerant field programmable gate arrays 
(FPGAs) inspired by nature [3]. Its main ideas come from the mechanisms sustaining 
the embryonic development of multicellular organisms in nature. When biological 
multicellular organisms reproduce, the new individual is formed out of a single cell 
(the fertilised egg). During the days that follow conception, the egg divides itself 
passing to every offspring a copy of the DNA that corresponds to the individual under 
development. Cells differentiate according to “instructions” stored in their DNA. 
Different parts of the DNA are interpreted depending on the position of the cell within 
the embryo [4]. Before differentiation cells are (theoretically) able to take over any 
function within the body because each one possess a copy of the DNA. 

Correspondingly, every cell in an embryonic array stores not only its own 
configuration register, but also those of its neighbours. To differentiate eveiy cell 
selects a configuration register according to its position within the array. Position is 
determined by a set of co-ordinates that are calculated from the co-ordinates of the 
nearest neighbours. Every embryonic cell performs self-checking continuously. If a 
failure is detected, the faulty cell issues a status signal that eliminates some cells 
according to the reconfiguration mechanism in use, e.g. cell elimination. The 
surviving cells recalculate their co-ordinates and select a new configuration register. 
By doing so every cell performs a new function and, if the amount of spare cells is 
enough to replace all the failing cells, the overall functionality of the original array 
should be preserved [5], A detailed description of the Embryonics architecture can be 
found in [6]. 



The k-out-of-m Reliability Model 

In many situations, a system with m units will work properly as long as k out of the m 
function correctly. For identical units with success probability p(t), the probability of 
exactly k units working correctly out of m is given by ( 1) [7], 

P{k, m, pit)) = ('; )/?(/)* (l ~ pit))”"^ 

For the general case, the system remains functional as long as k, k+1... m-1 or m 
units function correctly. Therefore, the probability of system success is, 

m (2) 

«,=EcM<)'(i-p(or 

i=k 

Failure distribution of electronic systems is generally assumed as, 

p{t) = e'^ 

Where k is a constant known as the failure rate. %. depends on parameters that 
describe the physical and operating characteristics of a device. 

Substituting (3) in (2) yields the reliability of a k-out-of-m electronic system, 
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i^k 



( 4 ) 



Figure 1 shows some graphs of (4) for different values of k, m and X. 




a) different values of k b) different values of X 



Figure I (a) shows that in a row with m cells an exponential increment in the 
number of active cells implies a linear deterioration of reliability. Figure 1 (b) shows 
the high reliability associated to small values of X, nevertheless decreasing the value 
of X requires an improvement in the quality of the system’s components and, in the 
majority of cases, the cost associated with this is too high. 



Analysis of Reconfiguration Strategies 



In the following analysis an array of size nxm will be considered in working order 
only if at least a sub-array of size r\k is working correctly. In FPGAs m and n are 
fixed, whereas k and r form a logical array mapped into the physical one. 
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Fig. 2. Fault-tolerance by a) row elimination and b) cell elimination 
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Figure 2(a) shows that in row elimination the failing of one cell provokes the 
elimination of the corresponding row, and cells are logically shifted upwards. Cells in 
a row are connected in series; therefore its reliability is the multiplication of the 
reliability distributions of all its cells. If all the cells in a row are identical and have an 
exponential reliability distribution, then row’s reliability would be The 

reliability of the whole array RJt), would be given by (2), 

« / \ _ ■ ( 5 ) 

j=r 

Even though row elimination logically removes many good cells when a fault 
occurs, the algorithm to carry it out is very simple and, therefore, fast and easy to 
implement in hardware. In addition, as the array becomes larger (more than 100 
cells), the percentage of cells lost during reconfiguration decreases dramatically. This 
is because, in square arrays, the size of the array grows quadratically, whereas the 
number of lost cells grows linearly with respect to the number of cells per side, n. 

Figure 2(b) shows cell-elimination in an array with one spare cell per row and one 
spare row. In cell-elimination, spare cells replace faulty cells in two stages. First, 
spares located in the same row replace faulty cells. When the number of faulty cells in 
a row surpasses the number of spare cells, then the row is eliminated and rows are 
logically shifted upwards. Each row of the arrays is itself a k-out-of-m system. The 
reliability of every cell is given by (3), therefore the reliability for each row Rrjt), 
would be given by (4). The reliability of the array R,c(t), would be given by, 

« ( 6 ) 

Cell elimination provides a very efficient use of spare cells, but the complexity of 
cells increases (with the corresponding increment in X), due to the extra logic needed 
to re-route data after reconfiguration. 




Fig. 3. Reliability distributions for cellular systems with spare units, 
a) k-out-of-m system b) r-out-of-n system 
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For the case when h=m (no spare cells in a row), equations (5) and (6) become 
equivalent, i.e. the row-elimination strategy is a particular case of cell-elimination. 

Figures 3(a) and 3(b) show how as the number of cells increases, the reliability 
curve becomes steeper around the point when reliability equals 0.5. For arrays with 
hundreds of thousands of cells reliability curves will approximate a rectangle of 
height 1 and width equal to the time for reliability of 0.5. Figure 3(b) also shows the 
close relationship between row reliability and array reliability. This indicates that the 
reliability in rows determines the main features of the array’s reliability graph. 



Conclusions and future work 

Large hardware cellular systems are good candidate platforms to investigate, in 
real time, the emergent behaviours characteristic of bio-inspired systems. Embryonics 
offers a good alternative for this kind of research. 

The distributed automatic reconfigurability characteristic of embryonic arrays 
offers considerable advantages over other reconfiguration strategies where, in most 
cases, a centralised agent, e.g. operating system or central processor, must solve the 
routing of information problem. For reliability analysis purposes, the effects that this 
central router imposes to the system must be taken into account. 

The main characteristic of embryonic arrays that makes them different to other 
cellular architectures is the simplicity of individual cells. Simplicity implies small 
values for X and consequently, good reliability figures. A design of embryonic cell 
stressing simplicity can be found in [8]. 

The reliability models presented in this paper can be adapted to other classes of 
cellular systems. Further research must be carried out in order to determine to what 
extent the models proposed hold for any fault-tolerant cellular system with spares. 
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Abstract. In this paper, we present a possible implementation of arithmetic 
functions (notably, addition and multiplication) using self-replicating 
cellular automata. The operations are performed by storing a dedicated 
program (sequence of states) on self-replicating loops, and letting the loops 
retrieve the operands, exchange data among themselves, and perform the 
calculations according to a set of rules. To determine the rules required for 
addition and multiplication, we exploited an existing algorithm for 
computation in the cellular automata environment and adapted it to exploit 
the features of self-replicating loops. This approach allowed us to study a 
variety of issues (synchronization, data exchange, etc.) related to the use of 
self-replicating machines for complex operations. 



1 Introduction 

The history of self-replicating cellular automata (CAs) has been marked by two 
major events. The first is von Neumann's development of his universal constructor 
[1], an automaton capable at the same time of universal construction, that is, of 
constmcting any other automaton given its description (and hence a copy of itself 
given its own description), and of universal computation, that is, of executing any 
given application. This automaton, unfortunately handicapped by its great 
complexity, was the starting point for much of the further research in the field 
[2,3,4]. The second major event in the study of self-replicating CAs is Christopher 
Langton's development of the automaton known as Langton's loop [5], an automaton 
where the features of universal construction and universal computation were 
sacrificed for the sake of simplicity. The result is a small automaton capable 
exclusively of self-replication, extensively used and ameliorated by Langton's 
successors [6,7]. 

The motivations behind the study of self-replication in the environment of cellular 
automata is not immediately obvious, since this environment presents many features 
(e.g., the unbalance between the size of the memory required to store the transition 
rules and the functionality of a single cell) which render it somewhat cumbersome 
for most practical applications. Nevertheless, CAs do provide a rigid mathematical 
framework which can be very useful to systematically develop new approaches to 
the problem of self-replication, approaches which can then be transferred to more 
"conventional" and "practical" environments. This is, in fact, the motivation of our 
own research into the field of self-replicating automata. In particular, we have 
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attempted to re-introduce computation to self-replicating automata in order to 
develop a mechanism allowing for self-replication in very large scale integrated 
circuits [8,9,10,11], 

In the course of our research, we have developed a set of computationally-useM 
self-replicating automata, notably by adding a Turing machine to Langton's loop 
[12], and, more importantly as far as this article is concerned, by developing a 
"programmable" automaton (Fig. 1), that is, a self-replicating automaton capable of 
storing and executing a user-specified program [10,13]. The versatility of this 
automaton, which we used extensively in the development of our hardware systems, 
was illustrated through a simple self-contained example (that is, a program with no 
external inputs) which, while useful as a demonstration tool, was nevertheless not 
very interesting from a computational point of view. In this article, we wish to show 
that it is indeed possible to perform computationally useful tasks using self- 
replicating automata by using our programmable automaton to execute some 
arithmetical operations, notably addition and multiplication, on binary numbers. 




Fig. 1. Our self-replicating automaton 

In order to implement these features, we used the particle model described by 
Steiglitz et al, [14], briefly introduced in the next section. We will then describe the 
implementation of this model in our automaton and its application to the operations 
of addition and multiplication, as well as to a combination of both. We will conclude 
with a few observations and remarks. 

2 Theoretical Notions 

To implement a binary addition function and a binary multiplication function on our 
automaton, we decided to exploit the model described by Steiglitz et al., since it is a 
model designed to operate within the CA environment and it can easily be adapted to 
self-replicating automata. 

Obviously, the details of the operation of Steiglrtz’s algoritlim had to be modified to 
fit the automaton, but we essentially maintained untouched the overall approach to 
the execution of addition and multiplication. 

2.1 Binary Addition 

To explain the mechanism used to add binary numbers, we will start with a simple 
example of a sum of two one-bit numbers. Tliis example is shown in Figure 2. 
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Fig. 2. Sum of two one-bit numbers. 
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To effect the sum, the two bits are stored in two cells which are moving towards 
each other. When the two cells collide, the right one is destroyed and the left one is 
transformed into a new right-moving cell which contains the result of the collision. 
The carry remains in place in the cell where the collision took place. In our example 
the left cell represent a logic 1 and the right one a logic 0. On the collision the sum is 
made and the new right-moving cell represent a logic 1, the result of the 
computation. 

For the sum of binary numbers coded on more than one bit, the left and the right 
addends are represented by a sequence of cells, each cell representing a bit (one or 
zero). The two sequences move towards each other. A processor cell is placed 
between the two sequences of cells (Fig. 3). In each number, the least-significant bit 
lead the sequence, so that when the two numbers collide head-on at the processor 
cell, this last can add the bits in order of increasing significance. 

processor 

left addend sequence cell right addend sequence 



Fig. 3. Two data stream collide on one processor cell 

After a collision the two incoming cells are destroyed, the processor cell computes 
the result and generates a new left-moving cell, which encodes the result of the first 
addition. After the creation of the "answer" cell, the processor stores the value of 
carry bit, which it will use to compute the result of the next collision between the 
bits of the two operands. 



2.2 Binary Multiplication 

For binary addition, we have seen that a single processor cell was sufficient for the 
computation. In the case of binary multiplication, we need a stream of processor 
cells, and more precisely double the number of bits of the multiplicands. Figure 4 
shows the starting configuration of the multiplication. 



left multiplicand 



prcessor stream 



right multiplicand 










Fig. 4. Two data sequences collide in a processor stream 



In this figure the left- and right-moving sequences of cells represent the two 
multiplicands and the processor stream is placed between the two sequences. To 
make the multiplication, the two multiplicands travel across all the processor cells. 
When two cells collide in a processor cell, this last computes the result according to 
the mles shown in Table 1. The two data cells then continue to travel across the 
processor stream. 
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When all the cells have traveled through the entire processor stream, the result of the 
multiplication is represented by the states of the processor stream’s cells. Figure 5 
shows an example of the multiplication of two 2-bit numbers. 
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Table 1 . Rules for the processor cells 



In figure 5, each row represent the state of the multiplication at the time t. In this 
example, the processor cells can have three different states. At the beginning (t=0), 
the cells' state is empty, while after the first collision the cells' state can be "1" or 
"0". At the end of the computation all of the processor cells are set to "1" or "0", and 
we can read the result on the processor stream: 11 x 11 = 1001, that is, in decimal 
notation, 3x3= =9. 
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3 Implementation on Self-Replicating Loops 

The programmable automaton we mentioned in the introduction (Fig. 1) consists of 
two concentric square loops: an inert internal loop (the sheath) and an active external 
program loop, containing the program to be executed along with the information 
used to direct the self-replication process. To duplicate itself, the automaton sends 
out four constructing arms, which build four new sheaths in the cardinal directions. 
When the sheath is complete, the automaton sends out the information contained in 
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the program loop (the external loop of the original automaton). Finally, the 
constmcting arms retract, completing the self-replication process and letting the four 
new automata attempt their own self-replication on the four cardinal directions. 
When an arm finds an obstacle (the border of the cellular array or an existing 
automaton), it retracts, abandoning the replication attempt. The self-replication will 
thus end only when all the available space (the cellular array) has been filled. 

As we will see in this section, both the operation of our loop and that of Steiglitz's 
model had to be slightly modified to allow them to be merged, but the modifications 
were fairly minor and the basic concepts were not in the least altered. 



3.1 Addition 

To execute this function, one automaton is charged with computing the result of a 
single collision between two data cells, imlike the original algorithm, in which a 
single processor cell computed the entire result. The initial configuration of the adder 
(Fig. 6a) consists of a single loop, containing the program which implements the 
sum. This first loop is a slightly modified version of the original loop, in that it 
replicates in one direction only (downwards). As time progresses, a column of loops 
will be created. The replication process ends when the last automaton finds, in the 
place where it should replicate, a special cell (Fig. 6b). Upon finding this special 
cell, the bottom automaton generates a START signal which propagates upwards to 
the first automaton to tell it to begin the operation. 
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(a) Initial Configuration 



(b)End of self-replication 



Fig. 6. Stream of automata 

Once the first automaton has received the START signal, it looks to its left to find 
the bits it needs to add. It extends its constmcting arm (Fig. 7a), retrieves the first bit 
it finds (least significant bit of the first number) and adds it to the second bit it finds 
(least significant bit of the second number). The arm then leaves in place the result 
of the computation and brings the carry bit back to the loop (Fig. 7b), which will 
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propagate it to the next automaton (Fig. 7c). The process continues until the bottom 
loop is reached, signaling the end of the sum. 

Once the operation is complete, the bottom loop will extend an arm downwards (as if 
to propagate the carry bit). The arm will meet one of three kinds of cells; a new 
START cell, which will activate a new sum, an END cell, which halts the operation 
of the automata, or an ACTIVATE cell, whose functionality will be explained 
below. 




Fig. 7. Computation of a collision 



3.2 Multiplication 

As for the sum, the multiplication starts with a single loop, which replicates towards 
the right (unlike the sum) to create a stream of 2N automata (where N is the number 
of bits of the multiplicands). 

The multiplication algorithm requires that the first collision between the data cells 
occur at a specific automaton, notably the Nth automaton from the right. This 
introduces some synchronization problems which complicate the execution 
considerably. The first complication is that a sequence of temporization signals, in 
the form of N-1 shifting cells, needs to be added in front of the left operand (Fig. 8). 
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Fig. 8. Starting point of multiplication 
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The operation begins when the self-replication process has ended (i.e., when the 
replicating automata have filled all the available space) and the leftmost automaton 
has received a START signal. At this point, the leftmost automaton (which we will 
call Loop 1) starts retrieving the data cells of the left operand and propagating them 
to the right. Throughout the multiplication, Loop 1 will keep retrieving and 
propagating the data cells at a frequency of one data cell every three time steps 
(where one time step is the time required for an automaton to extend and retract its 
constmcting arm). 

The first shifting cell (the first cell of the left operand to be retrieved) propagates 
then to the right until it reaches the rightmost automaton (which we will call Loop 
2N). Upon receiving the shifting cell. Loop 2N retrieves the first cell of the right 
operand and stores it. Each of the shifting cells traversing the automata will cause the 
right operand data cells to be shifted from the loop they are on to the loop to its left 
and a new data cell to be retrieved by Loop 2N. 

After N-1 shifting cells have gone through, each of the bits of the right operand 
(except for the last one) are thus distributed on the N-1 rightmost loops. When the 
first left operand data cell arrives (behind the shifting cells) on Loop N-i-1 (the Nth 
automaton from the right), the first collision occurs (Fig. 9). 

The collision process occurs between a data cell A on one loop and a data cell B on 
the loop to its right, according to the rules shown in Table 1. At the end of the 
collision process, the result of the collision and data cell B are stored on the left loop, 
along with a possible carry bit (which will taken into account when computing the 
next collision), while data cell A has been propagated to the right loop, where it will 
be used for the next collision. Each left operand data cell will thus collide with each 
right operand data cell, and the right operand will be shifted by one automaton to the 
right after being traversed by each left operand data cell. At the end of the 
multiplication, the right operand, stored on the N rightmost loops of the automaton, 
will be deleted by a special CLEAR cell, and the result will be stored on each of the 
loops. 




Fig. 9. Collision between two data cells 
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3.3 Combinations of Multiplication and Addition 

In order to render its operation more "useful", our automaton was conceived so as to 
be able to realize combinations of operations. In particular, it can compute the 
multiplication of two results of sums. That is, it can compute any fimction of the 
form: 

(A + B + .,.)*(a4-b + ...) 

In order to compute this kind of function, we need a starting configuration similar to 
Fig. 10, which expands to the machine shown in Fig. 1 1 after self-replication. 





Fig. 1 1. End of self-reproduction 
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When the left and the right automata have completed their sums, they leave a special 
ACTIVATE cell (mentioned above) for the multiplier to retrieve. The latter will 
interpret this cell as a START signal, and execute the multiplication. 

The two operations can thus be chained without difficulty, the only new feature 
being a carriage cell which will "reformat" the data generated by the adders into a 
form which the multiplier can use as an input (Fig. 12). 




Fig. 12. Operation of the carriage cells. 




4 Conclusion 

The goal of the work presented in this article was to show that is possible, and 
indeed not exceedingly difficult, to exploit the capabilities of self-replicating 
automata (and notably our self-replicating programmable loops) to perform complex 
mathematical operations. To demonstrate this, we implemented the arithmetic 
operations of addition and multiplication using the algorithm described by Steiglitz 
et al. The resulting machines, while relatively conplex (the final number of states 
required for combined sum and multiplication exceeds 30, including the states used 
only for self-replication), are nevertheless simple enough to be entirely simulated, 
and the use of the support provided by the programmable loops considerably 
simplified the finding of the relevant transition mles. 

It should be noted that, while the automaton we designed is simple enough for 
simulation, it is extremely unlikely that such a system would ever be actually used 
for real-world computing. As we have mentioned, in fact, cellular automata are a 
useful enviroiunent for theoretical research but its real-world applications are few 
and not usually concerned with complex mathematical operations. Moreover, "pure" 
cellular automata do not contemplate the existence of external inputs, i.e., of data, 
such as mathematical operands, which is not present in the cellular space at time 0 
(for example, in our system, the operands should clearly be inserted as needed, 
which would simplify considerably the operation of the automata). 

Our aim, however, was not to develop a cellular automaton to be used in real-world 
applications. As mentioned in the introduction, our goal in studying this kind of 
stmctures is to determine what the advantages and constraints are in the use of self- 
replicating machines for complex operations, so as to be able to transfer these 
observations to the design of self-replicating integrated circuits. From this 
perspective, the work we presented is indeed interesting, in that it allows a number 
of observations: 

1 Self-replication can be advantageously exploited to realize application-specific 
parallel systems by associating a self-replication mechanism and an execution 
unit. 
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2 The execution units need not be very powerful, as complex operations can be 
performed by many small identical units (the fundamental principle of 
parallelism). 

3 Self-replication allows the systems to adapt their architecture to the problem 
(for example, by producing the correct number of execution units to exactly fit 
a given problem). 

4 The problem of synchronizing the operation of all the units of the system is a 
major issue, as is the communication between the units. 

This kind of information has been, and will be, extremely useful in the development 

of self-replicating machines and in our attempt to realize von Neumann's dream. 
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Abstract. An artificial genome with biologically plausible properties 
was developed and the dynamics of gene expression were studied. The 
model differs from previous approaches, such as Random Boolean Nets 
[1], in that it is entirely based on template matching in a nucleotide- 
like sequence. Genes activate or inhibit other genes by binding to their 
regulatory sequences. 

The results of the experiments suggest that many features of real-life 
development, such as cyclic gene activity, differentiation into multiple 
cell types, and robusteness may be inherent properties of a template- 
matching system rather than necessarily designed from scratch by Natu- 
ral Selection. Moreover, the system may provide a new hypothesis about 
the role of junk DNA in real genomes. In addition to these biological im- 
plications, the approach used here is thought to provide a flexible basis 
for future simulations of morphogenesis. 



1 Introduction 

Since the mid 80’s, the advances in developmental genetics have been stagger- 
ing. The number of genes discovered is rapidly growing, and for organisms such 
as yeast [2] or recently the nematode C. elegans [3] the complete genome has 
been sequenced and the genes mapped. However, despite this wealth of data, a 
conceptual framework as to how genes interact is lacking [4]. 

Pioneering work on viewing genomes as complex networks and studying their 
dynamics was done by Kauffman [1] [5], using Random Boolean Networks as 
an abstraction. However, his results and the concomitant suggestions - such 
as viewing cell types as attractors - were only appreciated to a limited extent 
in the biological community. It can be argued that the reasons for this were 
a) the abstract nature of Random Boolean Networks, which to many do not 
exhibit sufficient parallels with real genetic networks, and b) the only limited 
extendability of the model. 

The model presented here therefore tries to provide a biologically more plau- 
sible framework for studying gene interactions, and thus to integrate develop- 
mental biology and the study of complex systems, which was recently stated as 
one of the major requirements of for our understanding of ontogeny [6], 




458 



2 The Model 

The main goal of the model is to study gene activity over time. Genes regulate 
each other by binding to regulatory sequences. 

At the core model is an Artifical Genome consisting of a string of digits, which 
is randomly created and contains all the information present in the model. Gene 
are not pre-specified, but identified in the genome after creation. The genome 
is searched for occurrences of the sequence ‘0101’, and on encountering one, the 
following N digits are defined as a gene. The product of a gene is yielded by a 
simple transformation of the gene sequence. In the current implementation, each 
digit is simply incremented by 1, so that the gene sequence ‘221133’ becomes a 
‘332244’ protein. Now, for each existing gene product, all matching regulatory 
sequences in the genome are identified and stored. For example, for the gene 
above, these would be all occurrences of the sequence ‘332244’, each of which 
controls the gene immediately following it. 

Upon completion of this procedure, each gene will be regulated by 0 to n 
regulatory elements, or more precisely by the genes coding for the matching 
gene products. Binding of a gene product to a regulatory sequence can have 
two effects on the future expression of the regulated gene. The regulatory unit 
can either act as an enhancer, thus activating the gene, or as an inhibitor, thus 
blocking its activation. Which of the two modes is applicable is dependent on the 
value of the last digit of the gene product. For example, one possible setting is 
to define all genes ending with ‘1’ as inhibitors. Regulation is not concentration 
dependent; the presence of single enhancer suffices to activate a gene. However, 
inhibition has priority over enhancement. 



2.1 Biological Plausibility 

The concept of a standard promoter (‘0101’) is used to define genes in the genome 
string. The equivalent in eukaryotic genomes is the so-called TATA box (a suc- 
cession of ‘TA’ nucleotide pairs), which is the major component of every gene’s 
promoter (allowing RNA polymerase to bind for transcription). Furthermore, in 
the model, regulation is achieved by the binding of gene products to matching 
sequences of the genome. This corresponds to the action of transcription factors 
in DNA-based genomes. Obviously, in the case of the latter, template match- 
ing is more sophisticated since it is determined by the protein structure of the 
gene product rather than by a simple sequence-sequence correspondence. How- 
ever, the idealisation in the model retains the fundamental concept or template 
matching and is therefore sufficient for the purpose. 

Genes are under the control of regulatory sequences (also called cis-elements), 
which in eukaryotic genomes can be either located before (upstream) or af- 
ter (downstream) the actual gene. Here, regulatory sequences are only located 
upstream. Again, this is a simplification that does not affect the fundamental 
principle. 

Finally, regulatory sequences can act as enhancers or inhibitors, a concept 
that is again mirrored in DNA-based genomes. The latter are different however, 
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Fig. 1. Gene expression and regulation in the Artificial Genome 



in that they allow complex interactions between regulatory elements as well as 
concentration dependent regulation. For simplicity, this is not the case in the 
current implementation of the model. 



2.2 Studying gene expression 

After a genome has been set up according to the rules outlined above, gene 
expression over time can be studied. To do this, initially all genes but one are 
deactivated. It is then determined which genes are regulated by the sole active 
gene, and these are marked as on or off (according to the mode of regulation, 
activation or inhibition) and updated accordingly in the next time step. The 
same procedure is repeated for every active gene of the next time step and 
analogously throughout a whole run of a specified number of cycles. As a result, 
a pattern of expression is yielded. This is made visible in an expression graph 
(Fig. 2), 



2.3 Parameters 

The following parameters can be modified in the current implementation; genome 
size, gene length, base (i.e. range of digits), degree of inhibition ^ . 

* Coded as the fraction of the base; i.e. a base of 4 and degree of inhibition of 1 means 
that on average one quarter of genes will be inhibitory 
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Fig. 2. Gene expression unfolding over time. Each gene is assigned a fixed position on 
the vertical axis (according to its number) 



Furthermore, two options make it possible to influence the system’s behaviour 
manually: manual toggle (switching on or off a gene for one time step) and knock- 
out (swtiching off a gene for the duration of an entire run). 

3 Behaviour of the Model 

The behaviour of the model is highly dependent on the parameter values. Three 
regimes can be distinguished: ordered, complex, and chaotic. As discussed below, 
the variable K, the average number of genes that each gene regulates, plays a 
crucial role in this respect. 



3.1 Ordered Gene Expression 

Gene expression is called ordered when genes are continuously active or inactive 
throughout a run (Fig. 3). 



Fig. 3. Ordered gene expression. «»>e=10,000; 6a3e=4; gene length =6; inhihition—Q. 
no. genes = 29; K ~ 1.103 



Ordered gene expression is observed if K, the degree of regulation, is low. 
This state is caused by three parameter settings (or their combinations): a small 
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genome size, a high base, or a high gene length. Each factor exerts its effect 
by resulting in a low probability that a given gene product finds a matching 
counterpart. 

Naturally, ordered artificial gene expression is of only limited interest due to 
the lack of resemblance to real-life gene expression, which is considerably more 
dynamic. 



3.2 Chaotic Gene Expression 

At the other extreme of possible regimes, gene expression appears to be random 
and shows no signs of patterns; it is chaotic (Fig. 4). 
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Fig. 4. Chaotic gene expression. sue=150,000; base=b; gene length = 6; inhibition=3. 
no. genes = 225; K = 8.568 



Chaotic behaviour is observed if the number of genes and regulatory sequences 
is large, thus resulting in a high degree of connectivity (as indicated by a if of 
8.568 in Fig. 4). 

Analogously to the opposite case (the ordered state) , parameter settings with 
a large genome size, a small base, and short genes cause the system to be in this 
regime. Like complete order, completely chaotic artificial gene expression is of 
only little attraction, since for genetic nets to perform a task (e.g. regulate the 
development of an organism), they must be able to carry information, a feat 
that is made impossible by complete randomness. 



3.3 Complex Gene Expression 

If the degree of gene regulation lies in-between the values for ordered or chaotic 
behaviour, the patterns of gene expression exhibit more comlex dynamics. One 
striking feature of these complex systems is the tendency to produce cyclic gene 
activity (Fig. 5). Preliminary results suggest that the period of the cycle (i.e. 
the number of time steps it encompasses) is positively correlated to the number 
of genes. For example, while the expression pattern with 56 genes has a period 
of 10 (Fig. 5), the following one - based on 88 genes - has a period of 30 (Fig. 
6 ). 




Fig. 5. Cyclic gene expression. si 2 :e— 20,000; 6ose=4; gene length = 6; inhihition=l. no. 
genes = 56; if = 3.285 




Fig. 6. Cyclic gene expression. size~20,00Q', base=4\ gene length = 6; inhibitions!, no. 
genes = 88; K = 3.303 



For an Artificial Genome in the complex regime, a given cycle of gene expression 
is converged upon from a number of different start genes (Fig. 7). It can therefore 
be viewed as a limit cycle attractor. Each genome typically exhibits several of 
such attractors (Fig. 8), the lengths of which were found to be distributed as 
shown in Figure 9. 

The existence of multiple limit cycles in complex Artificial Genomes gives 
support to the notion developed by Kauffman [1] that cell types may be viewed 
as attractors of gene expression in biological gene nets. If this is the case, cell 
differentiation can be viewed as a dynamic system moving from one attractor to 
another. Differentiation of cells in organisms is known to rely to a large extent on 
external signals (in particular from other cells) that, through chemical cascades, 
cause the activation or deactivation of genes [7]. It was studied in how far such 
controlled cell differentiation is possible in the model developed here. To do this, 
the system was run until it reached a limit cycle attractor, and then manipulated 
by switching on or off the activation of single genes. It was found that controlled 
‘differentiation’ into a different attractor is indeed possible (Fig. 10). 

However, the experiments suggest that only a small proportion of gene ac- 
tivity toggles produced changed expression pattern. In the majority of cases, the 
system was robust against these disturbances and quickly returned to its original 
attractors (Fig. 11). 

This robustness of gene expression is not just limited to the transient dis- 
ruptions carried out above, but is also observed if genes are completely knocked 
out (Fig. 12). 
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Fig. 7. System converging on the same cycle despite different start genes. 5*26=100,000; 
6 o56=5; gene length = 6; inhihition=2. no. genes = 152; K = 7.374 




Fig. 8. Multiple attractors of gene expression. 5ize=100,000; base=5-, gene length =6; 
inhibition=2. no. genes = 175; K = 4.425 




Fig. 9. Distribution of limit cycle lengths of 200 genomes. 5ize=100,000; base=5\ gene 
length = 6; inhibition = 2. The average number of attractors per genome is 5.425 






manual toggle (gene 31) 



Fig. 10. Differentiation into a different attractor by transiently toggling a gene state. 
si^e=20,000; 6ase=4; gene length = 6; inhibition=l . no. genes = S8; K = 3.30 




manual toggle (gene 8) 



Fig. 11. Gene expression is largely robust agains disturbances. iwe=20,000; 6ase=4; 
gene length = 6; inhibition=l. no. genes = 75; K = 4.173 




knock-out (gene 52) 



Fig. 12. Robustness against loss of gene function. si2:e=20,000; base=4; gene length = 
6; inhibition=l. no. genes = 56; K = 3.286 
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4 Discussion 



The behaviour of random Artificial Genomes mirrors that of real gene expression 
to a large extent. The model shows that cyclic gene activity is an inherent 
property of the system and does not have to be especially evolved from scratch 
by selection. 

Furthermore, the experiments suggest that a high degree of robustness is pos- 
sible without selection. This is manifested in two ways: a) transient disturbances 
of gene expression (carried out manually in the model) do in the majority of cases 
not result in altered expression patterns, and b) loss of gene function (achieved 
by manually knocking out genes) does in most cases not affect the overall pat- 
tern of gene expression. While the first case implies that sporadic mistakes in 
gene expression will not substantially affect the development of biological organ- 
isms, the latter case may provide an explanation for the frequency at which, in 
developmental genetics, mutagenesis expreiments (resulting in the loss of gene 
function) fail to yield observable deficiencies in development [7]. Traditionally, 
such robustness is thought to have arisen from evolution [4]; it was shown here 
that this need not be the case. 

A further feature exhibited by the model is the ability to ‘differentiate’ into 
different attractors upon receiving external signals. The key to this is the activity 
of a few crucial genes [switch genes), the modulation of which causes the current 
cycle to become unstable and change into a different expression pattern. In the 
experiments above, such modulation was achieved by manually switching on or 
off the respective genes. In biological genetic nets, on the other hand, it seems 
plausible to view signals external to the cycle as the source for this modulation, 
such as cell-cell interactions, diffusing morphogens or the unequal distribution 
of cytoplasmic determinants. Proceeding on the assumption that a cell type 
represents an attractor, this would open the way for Natural Selection to control 
cell differentiation by providing such signals, and making the genome responsive 
to them. 

A final interesting observation of these initial experiments is the amount of 
unused sequence in genomes with complex gene expression. It may be hypothe- 
sised that this represents the equivalent to junk DNA in biological systems. The 
existence of redundant sequence is a result of the central property of the model 
- template matching. In order for the system to reach the supracritical level [5] 
above which gene expression is not ordered, the number of genes and regulatory 
sequences must be sufficiently high. Large genomes increase the probability that 
gene products (transcription factors) find matching regulatory sequences. A nat- 
ural by-product of this modus operandi is the accumulation of redundant, unused 
sequence the more complex gene expression becomes. It is conceivable that Nat- 
ural Selection affects the complexity of gene expression by directly modifying the 
parameter genome size. If this is the case in biological systems, then complexity 
of gene regulation in organisms should be correlated with the amount of junk 
DNA. It is certainly too early to make definite judgements as to the validity of 
this notion, but it seems worth further investigation. 
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An important next step is a quantitative analysis of the results presented 
here. This work is currently in progress. Furthermore, the model lends itself to 
the incorporation of genetic algorithms (GAs). In particular, it will be studied to 
what extent selection can modify existing gene expression patterns (e.g in terms 
of cycle lengths or response to external signals). In addition, due to their biolog- 
ical plausibility, it is believed that Artificial Genomes of the type presented here 
are particularly suited to be embedded into a framework of artificial ontogeny. 



5 Conclusion 

The approach of studying the dynamics of gene expression with a model based 
on template matching has proven fruitful. The behaviour of the system mirrors 
that of biological genetic nets to a considerable extent, including features such as 
the existence of stable attractors, the ability to differentiate, and a high degree of 
robustness. However, though these features are inherent properties of the system, 
they alone do by no means suffice as the source of adaptation. Rather, they 
should be seen as providing the raw material upon which Natural Selection can 
act. The opinion is held that extending the model with GAs as well as embedding 
it into a framework of artificial ontogeny wUl provide valuable insights into the 
principles of development. 
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Abstract. The genetic-algorithm-based method described in this paper 
can be used to identify a parameter set whose value defines the gene 
regulation circuit. To demonstrate the effectiveness of the approach we 
choose Drosophito segmentation processes. In the processes, we search the 
parameter set of diffusion constant and transcription ratio of each gene. 
The characteristics of convergence were also investigated in order to find 
out how to improve the method. The results suggest that (1) when the 
gene regulatory network is hierarchically structured, genetic algorithm 
optimize the upstream parameters earlier than that of downstream in 
the hierarchy structure, (2) some gene network has smooth concave error 
surface with no local minima, and (3) the method can be used to test 
appropriateness of the basic model a.ssumed. 



1 Introduction 

The major questions we wish to address in this paper are the following: (1) given 
a relatively well understood gene regulatory network, can we identify a set of 
parameters that can reproduce spatial expression patterns for both wild-type and 
mutant alleles, (2) what are the characteristics of the gene regulatory network 
when it is applied to the stochastic parameter optimization method such as GA? 
Is there any feature that we can take advantage of when we need to work on 
very large networks? and (3) how does the appropriateness of the basic model 
framework affect the fitness during the optimization? Would an arbitrary model 
finds a set of parameters that can fit expression patterns, so that we cannot tell 
whether the basic model is good or bad? Or is the convergence retarded when 
an inappropriate model is used, enabling us to, infer quality of the basic model 
framework from the optimization characteristics? 
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Fig. 1. Hierarchical structure of gene regulation 



2 Hierarchical regulation of early segmentation genes 

Some of the genes involved in the Drosophila early segment formation form a 
hierarchy. A cascade is initiated by the maternal effect genes which regulate the 
expi'ession of gap genes. The gap genes interact among themselves to control 
their expression. They also control the patterns of pair-rule gene expression. 
The pair-rule genes interact among themselves to form the repetitive segmental 
divisions of the body, and they also control the pattern of segment polarity gene 
expression. The pair-rule and gap genes also interact to regulate the homeotie 
genes that determine the structure of each segment. By the end of the cellular 
blastoderm stage, each segment primordium has gained an individual identity by 
virtue of its unique constellation of gap, pair-rule, and homeotie gene products. In 
such manner, the genes mutually interact each other to form the concentration 
localization in specific areas in the egg. Figure 2 .shows the gene expression 
pattern along the anterior-posterior axis of the egg. The curves of the figure are 
drew by the computer simulation using a hand tuned parameter set to form the 
consistent expression pattern with the real expression pattern which is obtained 
by the biological experiments. 




Fig. 2. Expression pattern along the anterior-posterior axis of the Drosophila embryo. 
This graph data is based on real experimentations and a computer simulation. 
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3 Simulation Model 



In this paper we focus on the processes of the cascade of maternal effect genes 
{bed, nanos), gap genes {hunchback, Kruppel, knirps, giant), and pair-rule genes 
(even-skipped) — during the early embryogenesis. They are expressed by their 
o%vn regulation mechanism, such as hunchbackis activated by bicoid hut repressed 
by nanos. 

In consideration of activation and repression by the activator and repressor, 
the expression probability is given by the following formula; 



ti - 



knC, 



kaCa 



'a^activateTo. 






( 1 ) 



where C is the concentration of activator or repressor surrounding the gene i, 
k is the binding affinity for the gene i, and 0i is a constant value for the gene 
i. Expression occurs when ti is larger than a threshold value is pre-assigned for 
each gene. As a result of the expression, products are transcribed from genes. 
The transcription products are contained by the egg frame. As the Drosophila 
egg shape is longest along its anterior- posterior axis and there are no partitions 
inside it, we approximated it as a cylindrical container. 

The free diffusion of the transcription products within the egg is described 
by the following formula: 



m 

dt 



=-Di 



d^Uj 

dx^ 



+ 9i • Ui-h Ti{U), 



( 2 ) 

(3) 



where 

rp _ j CH {ti > thresholdi) , , 

* 1 0 (otherwise) ' ' 

Here Ui is the concentration of the product of gene i, D is its diffusion constant, 
Qi is its deletion function, and Tj is its transcription function. This function is 
applied to every gene product contained in the simulator so that the products 
form the concentration gradient along the x-axis (the anterior- posterior axis). 

Genetic algorithm is used to search the optimal parameter set. The answer 
of the expression pattern is given by the biological information whose pattern 
is shown in Fig. 2. The fitness is a function of an error value which is figured 
out by the pattern matching. At each slice along the anterior-posterior axis, the 
distance between the result value and the answer value is calculated, and the 
error is the sum of the squares of these distances. 

{fitness) = distance^ 

2^genes 2-jposition uisiunee 

The strategy we choose in this paper is tournament selection. The population 
is 100, tournament size is defined 5, and the tournament selection occurs twice 
in order to select two agents to mate. After that, the crossover and mutation 
produce two daughters. 
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(i) sum of all genes’ TSS (ii) TSS of each gene 



Fig. 3. TSS transition 



4 Result and Discussion 

The TSS along the generation is shown in Fig. 3(i). In the early generations, 
TSS is quickly getting smaller. After the 50th generation, TSS looks almost 
converged. Figure 3(ii) shows the TSS conversion transition of each gene along 
the generation. This result shows that the TSS of each gene converges 

Figure 4 shows the optimized expression patterns in the generation 10, 20, 
30, 40, 50, and 200. According to these results, the parameter set, which is 
positioned upstream in the regulational hierarchy, is optimized earlier than that 
of downstream. It is obvious from these graphs that the expression patterns are 
refined as the number of generations increases. 






Fig. 4. Optimized expression patterns 



After the TSS is almost converged (generation 200), the parameters of ma- 
ternal genes and gap genes are properly optimized so that the expression pattern 
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of those genes are successfully reproduced. However, the expression pattern of 
even-skipped is different from the ideal expression pattern. 

The simulation results show that only the even-skipped expression pattern 
could not be properly reproduced. We think the regulation of even-skipped must 
be quite complex because it must produce the tree peaks along the anterior- 
posterior axis and even-skipped is controlled in the downstream of regulational 
hierarchy. In this situation, we need totally different methodology to obtain the 
optimized parameter set for the downstream parameters. For example, it could 
work to separate the parameter set relating to their position of the regulational 
hierarchy and optimize the parameter from upstream to downstream step by 
step. 

Validation of the assumed model: It is suggested that the transition 
of the TSS can screen out the inappropriate assumptions in the simulation. In 
this paper, we performed the transcription simulation only by the protein-protein 
concentration interaction. It is generally used to approximate the transcription in 
biological simulation. For the even-skipped regulation, however, the other micro 
regulation localizes the eve protein in particular positions of the embryo, which 
is strongly based on the structure of the binding sites in the promoter region. 
Consequently, the TSS might be able to distinguish the validity of the approxi- 
mation of the model. At least, it would be appropriate that non-conversion part 
tells us that the part would be modeled by inappropriate approximation. 

Temporal dynamics: Biological experiments commonly took the gene ex- 
pression data as like qualitative on-off binary data, so normally we can obtain 
only information about what gene activates or represses what gene. In addition, 
there is little information about the temporal gene expression transition. In a 
computer simulation we can observe the temporal dynamics of the transition of 
the gene expression patterns. To make the value of the parameter set biologically 
more significant, we need more temporal gene expression transition data with 
quantitative information. 
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Abstract. In this paper we propose a comprehensive model of axis 
determination for the Drosophila wing disc, and we simulate spatio- 
temporal gene expression patterning in the Drosophila wing disc. The 
role of interaction among several genes involved has been analyzed sys- 
tematically in this simulation. The results illustrate that successful re- 
construction of the gene expression patterns observed in the actual wing 
disc requires several hypothetical gene regulations. Above all, we predict 
the possible existence of a diffusive feictor responsible for formation of 
the D-V boundary in the Drosophila wing disc. 



1 Introduction 

The fruit fly Drosophila melanogaster is one of the most popular animal models 
used in developmental biology, and a substantial amount of knowledge about 
genetic interactions and cell-cell interactions has been accumulated using this 
species and various genetic techniques. Scientists working in the field of pattern 
formation have investigated the Drosophila wing formation extensively, and have 
identified several genes responsible for the formation of the anterior-posterior (A- 
P), dorsal- ventral (D-V) and proximal-distal (P-D) axes. The mechanism of the 
D-V boundary formation, however is not well understood and is thus a major 
research topic related to the Drosophila wing disc. Although the development of 
the Drosophila wing disc is so complex that a comprehensive model is required 
for its systematic understanding, there is very little published literature on the 
systematic modeling of wing disc. 

The goal of this paper is to propose a biologically faithful model of axes deter- 
mination in Drosophila wing disc and to reproduce the gene expression patterns 
observed in actual wing discs. To take on integrative approach to analyzing the 
regulation involved in the Drosophila wing, we need to model the developmen- 
tal processes and simulate the spatio-temporal patterning in the development of 
Drosophila wing disc. 




473 



2 Drosophila Wing Disc 

A-P axis determination: Figure 1(a) shows the gene regulations of A-P axis 
determination. Hh diffuses to the anterior side and activates the dpp expression 
with Ci mediation [1] (in this paper, DNA genes such as engrailed are indicated 
by their short italicized forms (en) and the proteins they produce are indicated by 
capitalized but unitalicized short forms (En)). En represses the dpp expression, 
and dpp is expressed at the A-P boundary. Recent evidence has suggested that 
Dpp acts as a morphogen and determines the A-P positional information. 




anterior ceK A-P boundary postorkvcaS 



(a) A-P axis determination 




Oorsalcafi 0-V boundary cett D-VboundajyoeH VenEraicelt 



(b) D-V axis determination 



Fig. 1. A-P and D-V axis determination mechanism (a): [Dominguez et al., 96], (b): 
[de Celis et al., 97] 

D-V axis determination: Figure 1(b) shows the gene regulations of D-V axis 
determination. In the dorsal compartment the sector gene ap is expressed con- 
tinuously and activates the fng transcription. Fng encodes a secreted protein, 
which induces Ser expression. Ser is thought to act as a transmembrane protein 
that interacts with N receptors that are expressed on adjacent celts. N activa- 
tion at the dorsal- ventral boundary has non-aiitonomous effects on proliferation 
and activates the expression of the target genes TV, wg, and ct [2]. Diffusing Wg 
activates the target genes Ser and Dl, which is thought to act as the N ligand. 
Since Ct inhibits the expression of Ser and Dl, or inhibits the function of the 
Wg, Ser and Dl cannot be expressed at the D-V boundary [3] . It was observed 
in a recent study, however that Dl is a potentially a diffusible ligand [4]. These 
results support the idea that Dl diffuses to the D-V boundary and maintains TV, 
wg, and ct expression, but they cannot explain the initial TV expression. Both Ser 
and Dl accumulate in dorsal and ventral cells that flank the D-V boundary. The 
expression of vg is activated by diffusing Wg from the D-V boundary [5]. 

P-D axis determination: The cells express the al gene at the intersection of 
a stripe of Dpp with a stripe of Wg. The al expressing cells are fated to become 
the most distal cells of the wing. The dll expression is activated by Wg, which 
acts as a long-range morphogen. 



3 Modeling Framework 

Model Equation: In this simulation, we model and implement transcrip- 
tion/translation, cell-cell interaction, protein diffusion and degradation. We unify 
these processes with the following equation: 






474 



Ui : concentration of protein i Di : diffusion constant 

t : time / : protein production function 

X : position on x axis U : concentration vector 

y : position on y axis g : degradation rate {g = 0.1) 

Transcription/Translation Modeling: Since transcriptional regulation is 
well known, but translational regulation is not, we simplify the transcription 
and translation process into one black-box, which means that the amount of 
transcribed mRNA is the same as the amount of resultant protein. When the 
rightmost term of equation (2) is greater than a specific threshold value, the 
transcription function of gene X is determined as follows: 



/x(U) = 



1 + Yli Ua + Ylj “R 



( 2 ) 



where ua and aji are respectively the activation and inhibition rates of gene 
products i and j, and where Ua and Ur are the concentrations of activator and 
repressor proteins. In case neither the activator nor the repressor binds to the 
promoter region of the target gene, we introduce a constant value 1 to adjust 
the probability of transcription. 

Cell-cell Interaction Modeling: The cytoplasmic protein in a cell receives 
signal from receptor-ligand complexes activated by ligand, and the strength of 
this signal depends on the concentrations of the receptor and ligand. In this 
simulation, we determine the activity of cytoplasmic protein as follows: 



act = 



oirUr 



otbUh 



l+o/jf/R \+ai^U\,_ 



(3) 



where an and are the transportation rates of receptor and ligand, where 
Ur and I/l are the concentrations of receptor and ligand, and where n is the 
number of surrounding cells. For example, the concentrations of both N and Dl 
determine the Su(H) activity, which when high results in the expression of the 
target genes wg, ct, and N. 



4 A Model of Wing Axis Determination 

In our model, cells are represented as a square in a two-dimensional lattice, and 
arranged as a circle. As the gene regulatory relationships, we implement the gene 
regulation network shown in Figure 1. Some regulatory relationships, however 
such as Fng activates A expression and the dorsally localized protein inhibited 
N expression, are only hypothetical relationships introduced in order to make 
expression patterns consistent with actual data. In the initial condition, ci is al- 
ready expressed in the whole anterior compartment, en is expressed in all of the 
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(i) En 



(iv) Fng 




(ii) Hh 




(v) N 






(iii) Dpp 



(vi) D1 




(vii) Wg (viii) A1 (ix) Dll 

Fig. 2. The protein localization patterns involved in the Drosophila wing disc, (i)- 
(iii): A-P axis determination, (iv)-(vii): D-V axis determination, (viii),(ix): P-D axis 
determination 









posterior compartment, and ap is localized in the dorsal compartment. Figure 
3 shows simulated protein patterns, reproduced using known gene interaction 
plus a few hypothetical interactions, for the genes involved in the development 
development of the third instar Drosophila wing disc. The hypothetical interac- 
tions introduced are: (1) Fng activates the AT transcription, and (2) Ap or the 
Ap downstream gene product inhibits the N transcription. These results agree 
well with experimental biological data. In the actual wing disc, the gene N is 
weakly expressed in the dorsal and ventral compartment, and Dl is weakly ex- 
pressed in the whole ventral compartment. In the actual wing disc, a simulation 
that implements only known genetic interactions, however cannot reconstruct 
the expression of N and Dl. The expression of most genes involved in the axis 
formation therefore cannot be reproduced, because they are located downstream 
of N-Dl in the genetic cascade (data not shown, because no gene is expressed 
at all). It is necessary that other gene regulations be assumed if these patterns 
are to be reproduced. After a series of simulations, we confirmed that the key to 
reconstructing the gene expression patterns forming the D-V axis is the initial 
expression of N. For the initial N expression, within the limits of the known 
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genes and developmental processes, we assumed Fng activates TV expression as 
a dilfusive factor, and TV expression is inhibited by Ap or the Ap downstream 
gene, which is localized in the dorsal compartment. Using this hypothesis, we 
can reconstruct the actual gene expression patterns of the Drosophila wing disc. 
Figure 3 shows the comprehensive model of D-V boundary determination for the 
Drosophila wing disc. Although the idea, that Fng activates TV transcription may 
surprise knowledgeable readers, the systematic approach leads to the necessity 
of the diffusive factor or some other mechanisms being responsible for the initial 
N expression. 




Fig. 3. Comprehensive model of D-V boundary determination. The long-dashed line 
shows the hypothetical gene regulation 

5 Conclusion 

In this paper we proposed a comprehensive model of axis determination in 
Drosophila wing discs, and reconstructed in computer simulations the gene ex- 
pression patterns actually observed in the wing disc. We inferred that Fng pro- 
tein activates TV transcription and confirmed by computer simulation that this 
is a plausible mechanism of the D-V axis determination for the Drosophila wing 
disc. Detailed kinetic analysis and simulation of D-V axis formation led us to a 
better understanding of developmental processes. We think that our prediction 
will soon be confirmed by biological experiments. 
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Abstract. In order to understand and infer a principle underlying bio- 
logical phenomena, it is necessary to handle massive data of expression 
patterns, kinetics, and metabolism, so that plausible regulative mech- 
anisms are revealed. The in silica sampling and screening method de- 
scribed in this paper automatically infers possible regulatory network 
structures using several gene expression profiles. In an experimental 
evaluation of the feasibility of using this method, each of the possible 
topologies of three-unit networks were tested exhaustively. After a ge- 
netic algorithm was used to identify the parameter set for topology, the 
plausible topologies were selected by using mutant gene expression data. 
The experimental results demonstrate that the method can derive a set 
of possible network structures that includes the correct one. 



1 Introduction 

Biochemical and genetic approaches have identified the molecular mechanisms of 
many genetic reactions. Since a series of research projects have been undertaken 
(e.g., the Human Genome Project), the data of DNA sequencing and expression 
mapping of genes for each cell in each stage, e.g. of human, will be identified in a 
few years. In particular, recent technology such as DNA microarrays will reveal 
the expression pattern of each gene with time series [1-3]. 

The major question is how we can utilize such data to better understand 
biological systems. One of the things we would like to do is identifying the gene 
regulatory network behind specific biological processes. Time series expression 
data obtained from DNA microarrays is one of the most useful kinds of data 
that can be use to reconstruct gene regulatory networks. This paper therefore 
describes an initial attempt to “reverse engineer” a biological system by using 
the rich data generated from fast-growing high-throughput biology. 
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Identifying the topology of gene regulatory network is a non-trivial task. 
There have so far been several attempts to identify or reconstruct genetic net- 
works for specific biological problems [4-7]. Those studies have focused on target- 
specific networks, and each regulatory relationship is identified with single gene 
experts, or published literatures on such experiments. 

On the other hand, theoretical studies of genetic networks, such as Boolean 
Network[8, 9] is one of the approaches to identify the network topology. The 
model simplifies the assumptions about biological systems, and treats gene ex- 
pression as either completely on or off. It ignores the genes that have different 
biological regulatory effects at expression levels. Cluster analysis is another ap- 
proach to let the large-scale data fall into small subgroups [10, 11]. Based on 
the analysis, using euclidean distance between a pair of points of gene expres- 
sion pattern, correlated data sets are chosen. Furthermore, there have also been 
many attempts to model network architecture [12-15]. However, no effective 
method to reconstruct gene regulation network from time series expression data 
was proposed so far. 

Most previous works regarding genetic regulatory networks have either fo- 
cused on the modeling of the network or have classified genes over expression 
pattern. Considering the huge number of components, and functions involved in 
biological systems, the huge variety of combinatorial regulations possible in the 
networks is extremely large. Hence, when modeling the networks or analyzing 
the expression pattern, it is important to carefully take into account redundancy, 
network combinations, and conditions incompatible with biological phenomena. 
In addition, since there must be regulation mechanisms and factors not yet iden- 
tified, the method used to reconstruct the networks should be capable of handling 
such additional possiblities and constraints. 

The work presented in this paper aims at extracting a candidate group of 
possible topologies from a huge topology space, not uniquely, but as a group. Be- 
sides, using the additional data, such as knock-out expression profiles, plausible 
topologies can be resolved. 



2 in silico Sampling and Screening 

First, it should be noted that what we are trying to identify are genetic regulatory 
networks, which are networks defining regulatory relationship between genes 
represented as weight-labeled networks. Although this paper simply considers 
genes as sole elements, which we refer to as “nodes”, in the network for the 
sake of simplicity, RNA, protein, and other components even tissues can be 
incorporated when more accurate modeling is desired. 

As shown in Fig.l\ our method consists of two fundamental stages: a sam- 
pling stage, and a screening stage. In sampling stage, the possible topologies are 
chosen, according to the selected target, as virtual initial sampled topologies into 
a topology pool. If, for instance, the target topology is one in which four nodes 

The figure represents a case with up to four genes. 



1 
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interfere with each other, all possible topologies including up to four units are 
sampled for the initial topology group. If the scale of the network mechanism 
is small and tractable, all network topologies can be examined exhaustively. 
An exhaustive examination, however, is not likely to be feasiable because the 
mechanism will usually be a scale which takes a considerable amount of time to 
simulate, thus there are other ways to take instead, such as randomly picking up 
the network to limit the number of the sampling data. For each possible network, 
a set of parameters that generates the simulated expression profile which best 
fits the target data is optimized. 




C' 



D 



Fig. 1. Schematic representation of our method of in silica sampling and screening. 



Sampling The aim of the sampling stage is 1) to select, if exhaustive enumer- 
ation is not feasible, candidate of network topologies from the entire space of 
network topologies, and 2) to identify a set of parameters for each network to 
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generate simulated expression profile that best fits the target expression profile. 
This is done as follows: 

1. An expression profile^ from which plausible topologies are to be identified is 
selected 

2. Possible topologies are chosen (exhaustively, randomly, or using stochastic 
search) to form a pool of sampled data group D 

3. For each network topology in the sample data group D, the set of parameters 
resulting in the simulated expression profile best fitting the target expression 
profile is identified. This can be done by using a genetic algorithm (GA) or 
some other optimization scheme 

4. According to the value of TSS(Total Sum Square error) against the target 
expression profile, the following subgroup D"*" over the fitness threshold T+ 
is determined: 



D+ = {D+ G D\TSS{D+) <T+} (1) 



Screening Given the set of candidate network topologies and a set of param- 
eters, the aim of the screening stage is to select a set of final candidates using 
various mutant expression profiles. This is done as follows: 



1. Select mutant expression profiles to be used for screening 

2. For each mutant expression profiles, the similarities of simulated expression 
profile generated from each network is measured. In this evaluation, 1) the 
same weight of the network is used that was identified in step 4 of the 
sampling stage, and 2) network topology or function in each node is modified 
to reflect mutant to be used 

3. According to the fitness threshold T*(x = A“,...,N“) (which can be set 
independently for each mutant), subgroup D^{x — A“, . . . , N“) is resolved 
as follows: 

D^~ = e £>+ I TSS(£>^ ) < I 

: ; : (2) 



e D+ I TSS(£I^ ) <T^ ] 



where A“, for example, denotes knock-out mutant of gene A 

4. From all subgroups derived in the previous step, a group d of selected network 
topologies is extracted as below: 



d= ^^deiD^ n---n£»'^ ) I TSS(d) < (3) 

By taking two steps as sampling, and screening, the topologies which have 
potentials to be plausible ones are derived step by step gradually. 

^ It is a time series data of mRNA, or protein concentration, as shown in Fig. 2. All 
data must be that of wild-type, not supposed to include mutant ones. 
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3 Experimental Design 

3.1 Modeling of Regulative Interaction 

Let us assume a network with N nodes. The regulative interactions between 
the nodes can be represented, with vast simplification, with a fully connected 
weight matrix W, and an NxN connection matrix C. The value w takes real 
value between -1.0 and +1.0. Negative value is repressive regulation, while pos- 
itive value is promoting regulation for target node. In matrix C, an element Cp, 
represents the connection between node p and q with a boolean value (0 means 
unconnected, and 1 means connected). 

The expression state of a genetic network architecture containing N nodes, 
at certain discrete time t, is represented by a vector x{t) in N-dimensional space. 
Each element can be described as Xi{t)(i = 1, . . . , N), and defined by the follow- 
ing rule; 



Xi(t + 1) - F 



N 

^ ^ CjjWjiSjit^ hi 
3 = 1 



r 0 (u < 0) 

where F(u) = < u (0 < u < 1) (4) 

[ 1 (u > 1) 

where Si{t) is the concentration of expression product of component i {i = 
1, . . . , iV), and hi is threshold for promoting component i to determine the pro- 
moting level. As is described above, all regulatory interaction is defined in a 
simple manner. However, more sophisticated model would be applied in course 
of the development. 

3.2 A Simple Example with Three Interacting Genes 

Upon the rules described above, we herein use a small size network, with three 
nodes as an example. Although the experiment described here is examined un- 
der the quite simple condition of three node topology, there is such regulative 
interaction in biological models, as can be seen in circadian rhythms, which con- 
sist of three genes[16]. As for Drosophila circadian system, dCLOCK(dCLK)- 
CYCLE(CYC) heterodimers activate the expression of PERIOD(PER), and 
TIMELESS(TIM). The heterodimers of PER-TIM, in turn, block the the ac- 
tivation of dCLK-CYC heterodimers. The cycle thus closes as transcription/ 
translation-based negative feedback loop. There are many case in which even 
simple gene interaction plays a critical role, yet exact regulatory network is not 
identified. 

If each node takes two states, either connect or unconnect, the number of 
possible topologies for a three-node network is (2^)^ = 512. Let us consider the 
topology shown in Fig.2(b). Node A activates B, and also has an autoregulation 




482 



function. Node B activates C, while node C inhibits A. The initial level expression 
is set to A=0.5, B=0.0, and C=0.0. This network with this initial value generates 
the artificial expression pattern is generated as in Fig.2(a). 




(a)wild-type expression pattern 




(b)example of three unit topology 





(d)C expression pattern 



Fig. 2. Sample of time series data, the topology as target data, and knocked-out gene 
expression time series data used for screening 



We used a GA to fix the weight and threshold parameters. Evaluating strat- 
egy of our analysis is under the condition of 2-elite conservation and tournament 
selection with tournament size 5 to mate. Population size is 300, generation 300, 
mutation rate 0.01, and one crossover point are applied to the GA. 

3.3 in silico Sampling 

For sampling, the expression patterns of Fig.2(a) were used as target data to 
resolve the group D into subgroup £)+. We tested this method exhaustively, 
examining all possible topologies. For each network topologies, a parameter set 
to best reproduce the target expression profile is searched using GA. 

3.4 in silico Screening 

As for screening, expression patterns for B“ and O'" were prepared as sample 
patterns in advance. They are shown in Fig. 2(c), and Fig. 2(d). Expression pro- 
files of B~ and C“ are time series gene expression data when the transcriptions 
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of B and C are disabled. The threshold parameters T+ , T® , and for the 
screening were respectively set to 1000, 500, and 500. 

4 Results and Analysis 

As for the results of sampling, the histogram of a topology, which showed mini- 
mum TSS(the value was 29.7276) in wild-type, are shown in Fig.3, the topology 
and its expression pattern are shown in Fig.4(a) and Fig.??(b). Although both 
Fig. 3(a), (b) partially show gentle slopes, the results indicate to converge in fast 
way. 
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Fig. 3. Convergences of fitness 
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Fig. 4. Expression transition and the regulative description of topology. The topology 
showed the best fitness among all topologies in comparing with wild-type expression 
pattern 



The histogram of number of topologies against TSS range among whole 
topologies is shown in Fig. 5. The number of topologies are plotted against TSS 
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up to all topologies(=512). The data from the sampling indicate that the distri- 
bution of topologies is constant with the value up to 800, above which it increases 
abruptly. As a result of the sampling, 283 topologies were derived as . 




TSS 



Fig. 5. Histogram of all topologies against TSS range in wild-type, minimum is the 
distribution of the value of best fitness among individuals, whereas average takes the 
average of all individuals for each topology 



As the result of the screening process, the histogram of number of topologies 
for each TSS range for B~ and C~ axe described in Fig.6(a), and Fig.??(b), 
respectively. Subgroups and respectively had 37 and 18 topologies. 
Consequently, 8 topologies were converged into the final subgroup d as shown in 
Fig. 7. The encouraging fact is that the correct topology, shown in Fig.2(b), is 
included in this subgroup. 





(b)C mutant 



Fig. 6. Histogram of all topologies against TSS range in B and C 
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It is not possible, with only three profile data, to select only one topology 
from these candidates, because all of the data screened can generate equally good 
simulated expression profile. The used additional mutant profile data should be 
able to further reduce the number of final candidates. 

At the same time, close example of the candidate topologies revealed that 
there are common features in many of candidate topologies, such as autoregula- 
tive loop of A, inhibition of A from C, and so forth. Extracting basic common 
features in the final candidates might be feasible way to further reduce the can- 
didate topologies. 
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Fig. 7. Eight topologies screened as subgroup d 



In order to confirm the stability of the approach, we experimentally examined 
another expression pattern that had a different type of dynamics. The result was 
that subgroups and respectively had 42 and 56 topologies, and that 
d had only 20 topologies. In this experiment too, the correct topology was in 
the subgroup d. Although our model used for regulative interaction is based on 
linear equation with weight matrix, and is abstract, the results of the experiments 
exhibited the validity of our method to resolve the large-scale data into a correct 
topology. 

In order to demonstrate the efficiency and scalability of our method, we need 
to further investigate our method by using more complex, large-scale data. In 
addition, another criteria is necessary to resolve the subgroup into a unique 
plausible topology. 



5 Concluding Remarks 

In this paper, we have described an in silica sampling and screening method 
that can be used to screen out plausible network topologies from large-scale ex- 
presion data. The results of an experiment in which plausible three-node genetic 
network topologies were selected demonstrated that this method can screen out 
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a small set of topologies that includes the correct topology. Embedding some hy- 
pothetical factors in the network, we believe that the method can be applied not 
only to screening plausible topologies, but also to verification of the hypothetical 
regulation. 
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Abstract. A population of artifacts that increase the energy extracted from the 
environment by the individuals that use them evolves through selective 
reproduction of rutifacts and the addition of randomly generated new variants. 
In a series of simulations we explore the consequences of adopting different 
criteria for selecting artifacts for reproduction and the effects of im]>ortmg 
better quality artifacts from a technologically more adavanced population to a 
less advanced one. The main results are; (a) the best selection strategy is to 
have individuals with the same energy extraction capacity test all the artifacts; 
(b) artifacts tend to ampUfy interindividual differences in extracted energy; (c) 
better imported artifacts accelerate technological progress only if they are not 
too few in number; in any case the acceleration effect appears to be temporary. 



1. Introduction 

Cultural and technological evolution is a crucial component of the adaptive pattern of 
humans that caimot be ignored if Artificial Life must be able to account for the 
specific properties of human behavior. While many inqrorlant formal and 
mathematical models of cultural transmission and evolution have been published [1], 
[2], [3], there has been little work on agent-based models and simulations in which 
one is less interested in the relationships among aggregate variables but expects 
aggregate phenomena to emerge from the local interactions of many simple “agents”. 
In this paper we present simulations of some limited aspects of the evolution of 
artifacts using an Artificial Life paradigm. We intend this paradigm to mean not only 
an agent-based approach to the evolution of culture and technology, but also the sturfy 
of cultural and technological evolution in popubtions of biological agents [4]. In our 
simulations, agents are neural networks living in a physical environment and 
inheriting not only artifacts but also genotypes. In other words, technological 
evolution is studied against a backdrop of biological evolution. 

A population of individuals lives in an environment that contains food. The 
reproductive chances of each individual depend on the individual’s ability to collect 
food. As a result of the selective reproduction of the best individuals and the random 
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mutations that occasionally result in better offspring than their parents, the population 
tends to exhibit a good capacity to procure food. After the population has reached 
some degree of adaptation to the environment, something new happens. Each 
individual is provided with a certain number of artifacts (food containers) which are 
used to store, transport, or cook the food. The use of the artifacts increases the amount 
of energy obtained from the collected food. The artifacts differ in shape, material, 
technique of production, etc., and their different properties cause the artifacts to differ 
in their “added energy value” for the individuals using them. Assuming that each food 
element yields one unit of energy, the quality of any particular artifact can be 
described as a number that multiplies the energy contained in the food. The higher the 
number, the better the artifact. The “population” of artifacts reproduces selectively 
and with the addition of random mutations in the same way as the population of 
individuals. 

The reproduction of artifacts consists in the fact that new artifacts are produced by 
each generation of individuals by copying the artifacts of the preceding generation. 
The reproduction of artifacts is selective and copies of “model” artifacts are not 
identic^ to their models but random changes are added during the copying process. 
This results in the evolution of artifacts. Although the early generations of artifacts 
are not of very good quality, the selective reproduction of the best artifacts and the 
addition of new variants cause the progressive improvement of the quahty of the 
artifacts used by the population of individuals. 

In this paper we describe a number of simulations that study the evolution of 
artifacts. We are specifically interested in the mechanisms for selecting the artifacts 
for reproduction and in the effects of introducing high quality artifacts of a 
technologically more advanced population into the artifact pool of a less advanced 
population. 

2. Simulations 

2.1 Environment and task 

The enviromnent is a square grid of 20x20 cells that contains 20 randomly distributed 
food elements, each providing one unit of energy. An environment is inhabited by a 
single individual and, since in each generation there are 100 individuals, there are 100 
identical environments. 

When the individual happens to step on one of the 20 food elements, the food 
element disappears and the individual’s energy level is increased by 1 unit Then a 
new food element is introduced in a randomly selected position in the enviromnent so 
that the environment always contains 20 food elements. The food-seeking behavior of 
the individual is controlled by a simple feedforward neural network with 1 input unit 
encoding the position of the nearest food element with respect to the individual, 3 
internal units, and 2 output units encoding the movements of the individual in the 
enviromnent The simulation begins with a population of 100 individuals each living 
in its own copy of the envirorunent and with randomly assigned cormection weights. 
The life of all individuals lasts 1,000 input/output cycles. At birth the energy level of 
all individuals is set to zero. At the end of life the 20 individuals with the highest 
energy level are selected for reproduction. 
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Each of these 20 individuals generates one copy of its connection weights which is 
assigned to the neural network of one offspring, plus one or more additional copies 
that are assigned to a randomly selected number of additional of^ring. Hence, one 
individual can have 1, 2, or more offspring. However, the number of offspring is 
subject to the constraint that the next generation must include the same total number 
of individuals of the preceding generation, i.e., 100 individuals. 

While the connection weights of the first offspring are exact copies of the parent’s 
weights, the weights of the additional offspring are modified by adding a number 
randomly selected in the interval -0.1/+0.1 to each weight’s current value. The 
process is repeated for a certain number of generations until the energy level of the 
average individual at the end of life is 15 energy units, i.e., 15 food elements are 
captured by the individual. When the population has reached this level of capacity to 
procure food (which takes from 10 to 15 generations), the artifacts are introduced to 
the population. Each simulation is replicated 4 times starting each time from a 
different ‘seed’ for generating the initial set of connection weights, the random 
mutations, and the spatial distribution of food elements. 

2.2 Artifacts 

An artifact possesses 10 properties (shape, material, technique of production, etc.). 
Each of these properties can have one of two values: 0 or 1. Hence, each individual 
artifact is completely described by a sequence of 10 bits. One arbitrarily selected 
sequence of 10 bits is considered as describing the best possible artifact. If an 
individual were to use this artifact it would obtain twice the quantity of energy from 
each food element collected, i.e., 2 energy units instead of 1. An artifact with 5 bits 
having opposite values with respect to the corresponding 5 bits of the best artifact 
would allow an individual to obtain 1,5 units of energy from each collected food 
element. Hence, the quality of each individual artifact can be described as a number 
ranging from 1 to 2. This number multiplies the quantity of energy provided by the 
food. An initial population of 100 artifacts is generated by assigning to each artifact 
randomly selected values to its 10 bits with the constraint that no artifact in this initial 
population can have an Hamming distance from the best artifact smaller than 6. 
Hence, no artifact in the initial population of artifacts can have more than 1.4 utility. 
Five of these 100 artifacts, randomly selected, are assigned as models to be 
reproduced to each of the 100 individuals composing the first generation following 
the one in which the population has reached the average level of 15 energy units. 
While the same artifact is likely to be assigned as a model to be reproduced to 
different individuals, all five artifacts assigned to the same individual are different. 

An individual reproduces an artifact by observing the properties of the model and 
creating a new artifact with the same properties. To reproduce an artifact the 
individual has available a separate neiual network, distinct from the neural network 
allowing the individual to perceive and approach the food elements. This network 
reproduces an artifact by learning to generate an output identical to the input (auto- 
association task). To add new variants to the population of artifacts some noise is 
introduced in the copying process by randomly modifying to some extent the 
perceived properties of the “model” artifact. Therefore, the copies that are generated 
at the end of learning are similar but not identical to the inherited models. (To 
determine the quality of an artifact a continuous variant of the Hamming distance is 
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employed.) Alter learning is completed each of the 100 individuals hves an ordinary 
life (1,000 input/output cycles) searching for food and using the 5 artifacts it has 
produced by copying the models. At the end of life the 20 individuals with the highest 
energy are selected for reproduction using the same procedure as in the preceding 
generations without artifacts. However, the energy level of an individual is now 
computed by multiplying the number of food elements collected by the individual by 
the average utiUty of the 5 artifacts used by the individual. Therefore, the 20 
individuals that are selected for reproduction are identified on the basis of two factors; 
(a) the individual’s ability to collect food, and (b) the quality of the artifacts used by 
the individual. 

The 100 offspring of the reproducing individuals of the first generation using 
artifacts constitute the second generation using artifacts. What artifacts do these 
individuals inherit fi'om the preceding generation as models to be reproduced? Each 
generation uses a total of 500 different artifacts (5 artifacts per in^vidual). These 
artifacts tend to be all different because of the differences among the individuals that 
have reproduced the artifacts and because of the random noise introduced into the 
copying process. How the 100 artifacts that will fimction as models for reproduction 
in the next generation are selected fi’om these 500 different artifacts? The criteria for 
deciding which artifacts are selected for reproduction by the next generation vary in 
the different simulations we will now describe. The simultaneous evolution of 
individuals and of their artifacts continues until generation 600. 

3. Results 

3.1 Perfect knowledge of the quality of artifacts 

In this simulation we imagine that the individuals have perfect knowledge of the 
quahty (utility) of each artifact and th^ decide to reproduce (and use) the best 
artifacts of the preceding generation. Of the 500 artifacts of each generation the 100 
artifacts with the smallest Hamming distance fi'om the prototype are selected for 
reproduction and each individual of the following generation is assigned 5 of these 
100 artifacts as models to be reproduced. 

The results of the simulation show that for the 20 best individuals there is a very 
rapid increase in fimess during the first 25 or so generations and then fitness remains 
practically invariant for the rest of the simulation, whereas for the total population 
fitness also increases rapidly in the early generations but then it continues to grown, 
albeit more slowly, until the end of the simulation. Moreover, if we compare fitness 
with and without artifarts we find that there is a huge difference between fitness with 
and without artifacts for the 20 best individuals compared with a much smaller 
difference between the two fitnesses for the total population. 

With respect to the evolutionary improvement in the quality of artifacts the results 
show that that there is an extremely rapid improvement of the quality of artifacts in 
the first generations followed by a very stable state. 
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3.2 The artifacts selected for reproduction are those used hy the best 
individuals of the preceding generation 

Perfect knowledge of the quality of artifacts is an ideal case. To know perfectly the 
quahty of an artifact an individual should know what is the absolutely best artifact, 
i.e., the bit sequence selected by us as definining the best artifact. But we cannot 
suppose that this knowledge is accessible to our individuals. Therefore, in this and the 
following simulations we experiment with various ways in which our individuals can 
judge the quality of the artifacts used by the individuals of the preceding generation in 
order to decide which artifacts to reproduce. We assume that an individual can infer 
the quality of the artifacts used by the individuals of the preceding generation by 
examining the energy level of these individuals. The individuals with more energy 
can be supposed to have used better artifacts. In the present simulation the artifacts 
selected as models to be reproduced are those used by the 20 individuals with the 
highest energy in the preceding generation, and that therefore are the parents of the 
individuals of the current generation. Each of these 20 individuals has used S artifacts 
during its life and their 20x5=100 artifacts are selected as models for reproduction. 

The results show that for both the 20 best individuals and for the entire population 
(100 individuals) the total energy level (number of food items multiplied by the 
average quality of the artifacts used by an individual) is lower than in the previous 
simulation with perfect knowledge. This indicates that the less than perfect 
knowledge of the quality of artifacts, which in this simulation can only be inferred 
from the energy level of the individuals using them, makes the selection process for 
the artifacts less effective, and this is reflected in the lower energy level of the 
individuals using the artifacts. On the other hand, the abiUty to collect food does not 
change in this simulation with respect to the preceding one. 

The results concerning the evolution of the quality of artifacts provide a direct 
confirmation that less than perfect knowledge of the quahty of artifacts makes the 
selection process of the artifacts less effective. At the end of evolution the quality of 
the average artifact is around 1.65 whereas in the preceding simulation with perfect 
knowledge the average artifact had a quality greater than 1.85. It is also interesting 
that the evolutionary improvement of the quahty of artifacts is slower in the present 
simulation than it was in the previous one. 

3.3 All the artifacts are tested by all individuals 

The method used in the preceding simulation for selecting the artifacts for 
reproduction has the limitation that the artifacts selected are those used by the 20 
individuals with the highest energy. But the individuals with the highest energy are 
likely to be especially good at procuring food and therefore their high energy level 
may be due more to their ability to procure food than to the high quality of then- 
artifacts. In order to better evaluate the quahty of the artifacts it might be more 
appropriate to have many individuals, with different levels of food procurement 
ability, use the same artifact and then judge the quahty of the artifact on the basis of 
the average energy level of the individuals that have used the artifact. This should 
make it possible to separate the role of the quahty of the artifact in determining the 
energy level of an individual from the contribution of the individual’s level of ability 
in procuring food. 
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The present simulation implements this idea. In each generation we allow all the 
100 individuals constituting one generation to hve for another 1,000 cycles. These 
additional cycles have the purpose to evaluate all the 500 artifacts used by the 
individuals of that generation. Each of the 100 individuals is assigned 20 different 
artifacts, the 5 artifacts the individual has used during its regular life and 15 additional 
artifacts randomly selected from the 495 artifacts used by the other individuals. Each 
of the 20 artifacts is used by the individual for 50 cycles. In this way each of the 500 
artifacts is likely to be used by many different individuals (by 4 individuals on 
average) and these individuals are likely to have varying levels of the abihty to 
procure food. At the end of this evaluation period, the 500 artifacts are ranked on the 
basis of the average energy level of the individuals that have used them during the 
evaluation period and the 100 highest ranking artifacts are selected as models to be 
reproduced by the individuals of the next generation. 

The results of this simulation are not very different from those of the preceding 
simulation. If we added the period of artifact testing in order to obtain better 
evaluations of the artifacts, the results are disappointing. 

3.4 All the artifacts are tested by the 20 best individuals 

The method of testing the quality of the artifacts used in the preceding simulation may 
be too noisy. Since all the 100 individuals of each generation are involved in 
evaluating the artifacts, even individuals with very low levels of the ability to procure 
food evaluate the artifacts and therefore good artifacts may not be selected just 
because they happen to be evaluated by individuals with very low levels of the food 
collecting ability. We have therefore run another simulation in which the individuals 
that are asked to evaluate the 500 artifacts during the evaluation period are not all the 
100 individuals constituting an entire generation but only the 20 best individuals. In 
this way we hope to factor out the role of an individual’s food getting ability in 
determiiung the evaluation of an artifact since the best 20 individuals all tend to have 
high levels of food collecting ability and, at the same time, we obtain evaluations for 
all the artifacts used by each generation. 

The new system for selecting the artifacts to be reproduced gives better results 
than the preceding system. With the new system the energy level of both the 20 best 
individu^s and the entire population reaches higher levels and reaches these higher 
levels more qmckly than with the preceding system in which all individuals, good and 
bad, tested the artifacts. Furthermore, the average quality of the artifaas reaches 1.7 
in this simulation compared with sUghtly above 1.6 in the preceding simulation. 

4. Cultural diflusion of artifacts 

Imagine two separate populations that are both evolving their artifacts. The two 
populations are not at the same stage of technological evolution. Population A is 
technologically more advanced than Population B. In fact, the measured utility of the 
artifacts of Population A has already converged to a steady state of ma ximum utility 
typically observed in this type of simulations whereas the artifacts of Population B 
still have a long way to go in terms of increasing their utihty. At this point for some 
reason one or more artifacts of Population A are moved to Population B and become 
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members of the pool of artifacts of Population B. What are the consequences of this 
phenomenon of cultural diffusion? 

We assume that the best artifacts of Population A are moved to Population B but 
we vary the number of artifacts that are moved. In one simulation only the single best 
artifact used by the individuals of the last generation (generation 600) of Population A 
is moved to Population B. In other simulations either the third best or the 4 best 
artifacts of Population A are moved to Population B. When an artifact of Population 
A is moved to Population B, the artifact simply replaces a randomly selected artifact 
of Population B. Everything else remains the same. The ahen artifacts become 
members of the pool of artifacts of Population B when Population B is at generation 
15 of its technological evolution, "nie technological evolution of Population B 
continues rmtil generation 600 is reached. 

The results can be summarized as follows; 

A. The imported artifacts do not always become pennanent members of the 
technological pool of Population B. If we reconstruct the artifact lineages we find that 
in some cases the lineages of imported artifacts disappear from the technological pool 
of the host population. 

B. Lineages of imported artifacts are more likely to disappear the smaller the 
number of originary imported artifacts. 

C. If one or more aUen lineages become permanent members of the technological 
pool of Population B, they may cause an acceleration in the evolutionary 
improvement of the average quaUty of the artifacts of Population B. Hence, given (2^ 
we can expect to observe a positive effect of the introduction of better external 
artifacts only if a sufficient number of such artifacts are introduced in the host 
population. 

D. Even when there is a positive effect of the introduction of better ahen artifacts, 
this effect is likely to be progressively cancelled in the further evolution of the host 
population. 

5. Discussion 

A population with artifacts is able to extract more energy from the environment than a 
population without artifacts. Although in the simulations we have described 
population size is fixed, if population size varies as a function of energy a population 
with artifacts would outnumber and possibly drive to extinction a population without 
artifacts. In our simulations the artifacts are of different quality, i.e., they vary in their 
capacity to increase the amount of energy obtained from the collected food. If there is 
a tendency for better artifacts to be reproduced and if new variants are periodically 
added to ttie pool of artifacts, the average quahty of artifacts tends to increase. 

5.1 How are artifacts selected for reproduction? 

One problem explored in our simulations is how artifacts are selected for 
reproduction. Since the properties of individual artifacts correlate with artifact quahty, 
if one were to know the properties of the best artifact one would select for 
reproduction the artifacts possessing those properties. We have shown that if selection 
of artifacts is based on perfect knowledge of which properties of artifacts correlate 
with artifact quahty, we obtain the best results in terms of evolutionary increase of 
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artifact quality. But how can knowledge about the properties of artifacts correlated 
with artifact quahty be acquired? The performance of an artifact cannot be devined by 
just examining the artifact’s properties. To decide how good is an artifact one has to 
measure the performance of an individual using the artifact. In our simulations the 
performance of an individual is measured by the amount of energy the individual is 
able to obtain from collecting food and using the artifacts. But this poses the problem 
of separating the role of the individual’s food collecting ability from the role of 
artifact quality in determining the total amount of energy obtained by the individual. 

We have explored various strategies that can be used to achieve this separation: 
(1) the best artifacts are those used by the best individuals, i.e., those that obtain the 
highest amount of energy during their life; (2) the best artifacts are those that give the 
best results when they are tested by all individuals, i.e., individuals falling within the 
entire range of amounts of energy collected diuing life; (3) the best artifacts are those 
that give the best results if they are tested by the best individuals. 

Selecting for reproduction the artifacts used by the best individuals (first strategy) 
does not seem to be a good choice because the best individuals may obtain the large 
amoimts of energy that allow them to reproduce not because of the quahty of the 
artifacts they use but because of their high level of ability in collecting food. This 
result seems to indicate that the strategy called by Boyd and Richerson “indirect bias” 
in cultural evolution [1] may not be a very efficient one. “Indirect bias” is the strategy 
of selecting for reproduction those artifacts (behaviors, ideas, etc.) that are used 
(exhibited, produced, etc.) by the best individuals in the population. Given the 
impossibility or difficulty of directly assessing the quahty of an artifact, one chooses 
to reproduce the artifacts used by the best individuals with the implicit assumption 
that the best individuals obtain their high level performance because of the high 
quahty of the artifacts they use. But in fact the best individuals may reach high 
perfomance levels for other reasons than the quahty of artifacts they use. 

The second strategy may not be appropriate for the opposite reason. If one has the 
entire population test all the artifacts, one risks that good artifacts may not be 
reproduced because they have been tested by individuals with very low level of food 
collecting ability. An a^act may get bad scores not because it is of inferior quality 
but because the individuals that test the artifact are not very good at collecting food 
and therefore they obtain low scores even with good artifacts. 

The results of our simulations seem to show that the another strategy may be a 
better one. This strategy is based on observing the performance of the best individuals 
of the population while each of them is using a randomly selected sub-set of all the 
artifacts and on selecting for reproduction the artifacts that on average result in better 
performances. (This is a form of “direct bias” in Boyd and Richerson’s terminology.) 
This strategy factors out the role of an individual’s food getting ability in determining 
the individual’s performance because all the individuals that test the artifacts are very 
good at getting food and leaves the quality of the artifact as the only factor that 
determines the average performance obtained by different good individuals using the 
same artifact. Of course, this strategy has to pay the cost of observing repeated 
performances of different individuals using the same artifact. This may explain why 
another strategy may be the preferred one by real individuals, that is, the strategy 
called by Boyd and Richerson the “frequency bias” strategy. According to this 
strategy, the artifacts (behavior, ideas, etc.) selected for reproduction are those that 
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appear to be used more frequently in the population (cf. also [5], [6]). However, the 
scenario used in our simulations did not allow us to test this strategy. 

5.2 Artifacts amplify interindividual differences 

An interesting result of our simulations is that, apparently, artifacts tend to amplify 
interindividual differences. When we compare the distance between the evolutionary 
curves of energy for the 20 best individuals and for the entire population of 100 
individuals with and without artifacts, we find that this distance is greater with 
artifacts than without artifacts. In other words, the difference in total energy between 
the best individuals and the average individual in the population tends to be greater 
when one includes the role of artifacts in determining this total energy than if one 
only considers an individual’s food collecting ability. To exemplify, two individuals 
that are able to collect 100 and 150 food elements, respectively (difference in energy 
= 50), will have 150 and 225 actual energy (difference of energy = 75) if they both 
use artifacts of 1.5 quahty. 

This simple result can have some interest for interpreting the evolution of human 
societies. Technological evolution can have had a role in increasing social 
stratification in human societies. In a population with httle or no technology the 
differences in energy among the individuals depend exclusively on the differences in 
the personal characteristics of the individuals. Since these differences are not in 
general too great in absolute terms, a population with little or no technology tends to 
be egalitarian in terms of interindividual differences in available energy. However, 
artifacts may have the role of amplifying interindividual differences in energy. Hence, 
the richer and more sophisticated technology associated with the Upper Paleolithic 
and, especially, with the Neolitic and the development of metallurgy may have had a 
role in increasing the economic and social stratification of human societies in these 
periods. 

5.3 Diffusion of artifacts 

Our simulations show that if the artifacts of a population with more evolved artifacts 
are brought into the artifact pool of another population with less evolved artifacts, this 
may result in some sort of boost for the technological evolution of the second 
population. Both the total energy of the host population and the quality of its artifacts 
increase more rapidly than if the population continues its evolution with no external 
influence. 

This result must be qualified in two ways, however. First, the more evolved 
imported artifacts have an influence only if they are not too few in number. The 
second qualification is that the boost in the technological evolution of the host 
population may be a temporary one. A population which does not experience the 
beneficial influence of external artifacts in the long run tends to reach the same level 
of technological development of another population whose technological pool has 
been beneficially “infected” by better external artifacts. This result, however, can be 
restricted to our simple simulation scenario. In more complex, and realistic, scenarios 
a termporary boost in technological development for a certain class of artifacts can 
trigger the development of new artifacts in such a way that the population with no 
experience of better external artifacts may never abolish the dhstance fi'om a 
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population which has had this experience. In any case, even if the boost in 
technological development due to the arrival of better external artifacts is temporary, 
it still can have important long term consequences. In a simulation scenario with 
variable population size a population experiencing a technological acceleration due to 
better imported artifacts may have a time window during which it nmy outnumber and 
perhaps iive to extinction another less technologically advanced population. 
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Abstract. This paper presents a simulation of the evolution of altruism in a 
population of agents. The agents can behave altruistic by stepping into water at 
the cost of drowning, thereby forming a bridge with their body for other agents. 
We show that altruism evolves only when relatives have a higher probability to 
benefit from it. By mixing up the population no altruism evolves even though 
the population goes extinct. 



1 Introduction 

Altruistic behavior benefits others at the expense of the altruist. How could such 
behavior evolve, if natural selection selects for the fittest individual? One explanation 
comes from the field of game theory, where it has been shown that reciprocal altru- 
ism, which benefits the altruist in the long run, can be evolutionary stable [1], This 
paper, however, focuses on non- reciprocal altruism', which is more problematic from 
the point of view of evolutionary biology. Some simulations of non-reciprocal altru- 
ism have been done [2,3], but with a very different design. The setup of this simula- 
tion is particularly relevant for the controversy over the unit of selection, which fo- 
cuses on the question ‘on what unit does selection operate, the gene, the individual, or 
the group?’ In our simulation a population of agents, has to survive and reproduce in 
a world consisting of islands. In order to survive they have to gather food from the 
islands, but they can only reach these by using another agent’s body as a bridge. But, 
when this other agent steps into the water to form a bridge, it dies instantaneously. 



2 Simulation Setup 

The world consisted of 110 by 110 cells placed on a grid (torus). Each cell could 
either be a water cell or a land cell. Within this grid, 100 islands, each 10 by 10 cells. 



' The term non-reciprocal altruism is used to distinguish it from reciprocal altruism, while 
avoiding the term genuine or true altruism, which tends to be associated with intentional al- 
truism, which is a different topic altogether. 
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were placed. Canals of 1 cell in width separated the islands. Land cells could contain 
food or an agent, and water cells could contain a dead agent forming a bridge. A 
living agent could cross the canal by using this bridge. 

Agent behavior: The agent’s behavior was determined by four parameters: p- 
water, p-food, p-land, p-cross. Each of these parameters corresponded with one of 
four cells the agent could see in front of it: empty water, food on land, empty land, or 
a floating agent forming a bridge. The parameters represented the probability of step- 
ping forward when in front of the corresponding cell. A random number, from the 
interval [0,1] was drawn. When this number was below the corresponding p value the 
agent would step forward, otherwise it would randomly turn left or right on the spot. 
The agent used no other information apart from the cell in front of it, and it had no 
internal state. Agents could not step on top of other living agents. When one was in 
front of it, it always turned randomly left or right on the spot. The agents were syn- 
chronously updated. 

Food and Energy: After each cycle, hundred cells were randomly chosen any- 
where on the map. On these cells a food patch was placed when it was an empty land 
cell. When stepping on a food patch the energy of the food patch (EnergyFood) was 
added to the agent’s energy and the food patch was removed. The agent always lost 
one unit of energy per iteration. When the agent used up all its energy it died of star- 
vation. 

Water: When stepping into water the agents immediately drowned. The agents 
remained ‘floating’ during a number of iterations as determined by the BridgeTime 
parameter. While floating, other agents could step on top of the agent using it as a 
bridge. 

Reproduction: The agents could multiply through asexual reproduction. The 
agent gave birth when it had gathered enough energy {EnergyToGetchild). When a 
child was born the parent’s energy was reduced by a certain amount of energy 
{EnergyCostChild) and this formed the starting energy of the child. The child 
inherited all the parent’s four p values, but for each child a slight mutation was added. 
For each mutation, a random number was drawn from the interval [-1,1] and this was 
multiplied by the Mutation parameter and was added to the corresponding p value. 
Resulting p values outside the [0,1] interval were clipped. This procedure was 
repeated for all p values. A newborn agent was placed on the cell directly behind its 
parent. 

Age: When agents reached the maximum age {MaxAge), expressed in the number 
of iterations after birth, they died of old age. 

Initialization: At the start of each simulation, a number of agents {InitNumber- 
Agents) and a number of food patches {InitNumFood) were placed at random loca- 
tions, excluding water cells. Depending on the simulation, initial p values were ran- 
domly chosen or set to a specified value. The initial agent’s energy was randomly 
chosen from the interval [0, EnergyToGetChild]. Its initial age was randomly chosen 
from the interval [0, MaxAge]. 

Simulator: The simulations were performed with a self-build simulator. To rule 
out any effect of bugs two simulators were independently developed by the two 
authors, without accessing each other’s code, using two different languages. 
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3 Simulation 1 

Two simulations were run, one normal and one control. (The default values for all 
simulations: MaxAge 250, EnergyToGetChild 120, EnergyCostChild 60, EnergyFood 
9, BridgeTime 30, Mutation 0.004, InitNumFood 30/Island, InitNumAgents 15/Island) 
Both simulations were run for 4 million iterations. There is a strong selection for a 
high p-food and high p-cross driving them quickly towards 1.0. A high p-cross in- 
creases the chance a new island is reached once another agent has body-bridged. On 
average 21% of the islands were uninhabited, and these grew quickly full of food. 
Reaching such an island would make it possible to give birth to a large number of 
offspring. The p-land parameter reached its stable value at about 0.8. The parameter 
of our main concern, p-water, immediately dropped to a value close to zero. There is 
a high selection against high p-water values because of the lethal consequenees of 
body-bridging. Even though p-water was only 0.0080 (all reported values are aver- 
aged over the last 100.000 iterations of the simulation), the percentage of agents that 
died of drowning was high: 20%. 

In the control simulation we undermine the effect of kin selection by distributing 
relatives randomly over the map, so that when an altruist would body-bridge the other 
agents on that island would have no systematic higher p-water than the population 
average. We mixed up the population with the following procedure: When a parent 
gave birth, the child was not immediately placed on the map, but only after a second 
parent gave birth. The first child was then placed next to the second parent, while the 
second child had to wait for a third parent to give birth before it was placed on the 
map, etc. In practice the delay of placement was on average 3/4"' of a cycle. With this 
swap method the balance between food and newborns was not affected in any way, 
and we avoided randomly placing newborns on the map, whieh would take away the 
need for island hopping because new agents could then be born onto an uninhabited 
island. The general pattern of the control simulation was similar to that of the normal 
simulation, but p-water now reached a clearly lower stable value of 0.0046, indicating 
that at least part of the p-water value in the normal simulation is attributable to non- 
reciprocal altruism. 



4 Simulation 2 

If kin selection is the driving force for the evolution of altruistic behavior, how cru- 
cial is the exact distribution of relatives? In this simulation we varied the probability 
an agent was swapped (p-swap) from 0.0 to 1.0 in steps of 0.1. All conditions were 
run for 500.000 iterations and were all replicated 10 times. Because p-water was our 
only concern, all following simulations used fixed p values except for p-water; p-food 
was fixed at 1.0, p-land at 0.8, p-cross at 0.8. This was done to decrease the time it 
took to reach a stable p-water. The values were based on an earlier explorative simu- 
lation of 1 million iterations. All other parameters were set to the default values, ex- 
cept for the Mutation parameter, which was set to 0.0016. 
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Fig. 1 . The p-water values plotted against p-swap. 

Fig.l displays the final p-water values averaged over the 10 replications for each 
condition. With the increase of p-swap, p-water decreases rapidly, indicating that the 
amount of altruism is inversely proportional to the amount of kin mix up. 
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5 Simulation 3 

In simulation 2 the percentage of agents drowning in the p-swap 1 .0 condition was 
still 9%, while evolution is expected to select against drowning. If altruism indeed 
depends on the clustering of relatives we should expect no altruism to evolve with the 
p-swap set to 1.0. The drowning rate, we assume, should, thus, be caused by noise 
added by the mutation parameter. Could we reduce the drowning rate by lowering the 
mutation parameters? In this simulation the Mutation parameter was set to 0.0001 and 
it was run for 2.500.000 iterations, in both the normal and swap condition. 




Fig. 2. The p-water values plotted for all agents for a number of iterations. 

The normal condition showed the same pattern compared to the normal condition 
in Simulation 1 and 2. The final p-water value was 0.0071. To get an impression of 
how the evolutionary process took place we plotted the p-water values for all agents 
for a number of iterations. Such a plot is a recording of the agent’s (artificial) evolu- 
tionary history. In the swap condition lowering the mutation parameter did indeed 
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seem to lower the p-water value. As can be seen in Fig. 2b, p-water reaches values of 
about 0.0005 on iteration 80.000, much lower than the final value of 0.0046 in 
Simulation lb. However, due to the lack of transportation to new islands, the whole 
population went extinct. This result stresses that the intuition, of altruism evolving 
because of its value for the whole, is flawed. Without the proper distribution of genes 
there will be evolution towards egoism, even at the cost of extinction. 



6 Discussion 

The altruistic behavior in our simulation is not directed explicitly towards kin, and 
can also benefit unrelated individuals. Unrelated individuals can invade other inhab- 
ited islands by using body-bridges, and in simulation 2 also by child swapping. The 
inhabitants of an island could, therefore, be interpreted as a group according to Wil- 
liams’ [4] definition of a group as ‘something other than a family and to be composed 
of individual that need not be closely related’ (p93). However, in the same book Wil- 
liams refers to kin selection as an alternative to group selection (p97). No such dis- 
tinction can be made in our simulation. Although our results can be interpreted as a 
kind of group selection, it is certainly not an alternative process to kin selection. We 
thus conclude that talk of group selection only makes sense as it is interpreted as 
another perspective of kin selection (see [5]). 
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Abstract. It has become increasingly apparent that spatial and other 
forms of ecological situatedness can introduce radical differences in the 
evolutionary outcome of models of conflictive social behavior. Coopera- 
tive interactions are often found to have an increased viability in spatially 
situated models. One possible explanation for this phenomenon makes 
use of kin-selective arguments according to which high relatedness be- 
tween neighbors stabilizes cooperation. Unfortunately, in some cases the 
argument does not go beyond the merely verbal. This paper shows that 
an explanation in terms of kin selection can easily be tested in a com- 
puter simulation and that, in the particular C£ise treated here, the result 
of such verification is negative thus strengthening previous conclusions 
regarding the relevance of other factors such as discreteness, stochasticity 
and ecological organization. 



1 Introduction 

A possible contribution of Artificial Life to real biology is in the use of relatively 
novel techniques for addressing already established problems in the biological 
literature. Such is the case, for instance, of studies that take a problem from 
evolutionary biology and try to complement what is known about it with ex- 
tended models mainly in the form of computer simulations (Bullock, 1997; Noble, 
1998; Di Paolo 1997 and others). 

The type of contribution provided by these models does not differ much from 
the use of similar techniques by biologists themselves (e.g. Durrett & Levin, 1994; 
Krakauer & Pagel, 1995). The difference is perhaps one of style and maturity. 
Theoretical biologists tend to be conservative and address specific issues. In 
contrast, many Artificial Life models tend to be more exploratory, addressing 
different issues at the same time and mixing (sometimes, but not always, in a 
less than clear manner) points regarding evolutionary stability, ecological and 
physical constraints, cognitive mechanisms, etc. 

This wider exploratory attitude is simultaneously a source of excitement and 
inventiveness as well as a source of poor scientific methodology. With very few 
exceptions (e.g. Noble & Cliff, 1996) researchers have failed to turn a critical 
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eye on their own work and that of others. This article is intended to be viewed 
as belonging to this rare class. However, instead of critically addressing other 
specific models it will concentrate on showing how an assumption has been re- 
peatedly used as an explanation for what goes on in such models without any 
warrants for its applicability and, worse still, without any confirmation that the 
conditions were met for its use when such confirmation should be relatively easy 
to obtain. 

It has become apparent that spatial situatedness can enhance the evolu- 
tionary viability of cooperative social behaviors under circumstances of conflict 
of interest (e.g. Krakauer & Pagel, 1995; Nakamaru, Matsuda & Iwasa, 1997; 
van Baalen, & Rand, 1998). A common situation is that, for certain interac- 
tion games, cooperation is unviable in the mixed medium approach but stable if 
players are spatially distributed and interact locally. There are different possible 
explanations for this phenomenon, but one of the most favored, at least intu- 
itively, is the use of kin-selective arguments according to which most games are 
played between highly related individuals (due to the limited dispersal or ‘vis- 
cosity’ of population) and therefore cooperation is understood as the strategy 
that maximizes inclusive fitness (Hamilton 1964; see also section 3). 

Unfortunately, in some cases this explanation has been merely verbal despite 
the potential for actually confirming its applicability (e.g. Ackley & Littmann 
1994, Oliphant 1996 and others). This careless appeal to kin-selective arguments 
is dangerous. One is tempted to wrongly infer that spatial situatedness always 
implies the validity of such arguments^ and, on the flip-side, one is tempted to 
ignore other possible factors that may play a significant role. It is also bad science 
since the means exist for verifying whether the chosen explanation is correct or 
not and there is no justification for not doing so. 

This paper will show in some detail that a verification of the applicability of 
kin-selective arguments can be achieved fairly simply. The study will be based 
on an evolutionary model presented elsewhere (Di Paolo, 1997, 1999, submitted) 
but, due to limitations of space, will not go much into the details of that model. 
Finally, the paper will support the conclusion that cooperative outcomes in spa- 
tial games need not be tied to kin-selective arguments and that other ecological 
factors such as spatial organization, discreteness and stochasticity can be re- 
sponsible for the selective stability of cooperative behaviors. The paper will, 
therefore, be less than exciting and perhaps a little on the boring, but necessary, 
side of scientific research. 

2 The game 

The model presented in (Di Paolo, 1997; submitted) describes a simple action- 
response (two-role) game with tuneable degree of conflict of interest, two out- 

’ Quite on the contrary, some models have shown that under certain circumstances the 
positive effect of higher relatedness between neighbors is actually cancelled by the 
negative effect of increased local competition (Taylor, 1992a; 1992b; Wilson, Pollock 
& Dugatkin, 1992). See also (Kelly, 1992; van Baalen & Rand, 1998). 
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comes (cooperative coordination and non-coordination) and fonr different strate- 
gies. The game is played by a popnlation of individuals distributed in space who 
interact locally within a fixed neighborhood of their positions. Players subsist 
on the energy gained by playing the game and reproduce with a rate which is 
proportional to their capacity for accumulating extra energy. This capacity de- 
pends obviously on the strategy they play, on the local pool of strategies of the 
other players and also on ecological factors such as the density of local players 
and the density of local resources. 

Each time the game is played the energy contained in a local food source is 
at stake. If the players cooperate they share this energy equally; if they do not, 
then the first player gets a proportion c which is greater than one half and the 
second player gets nothing. The parameter c is a measure of the degree of conflict 
between the first player and the second (low conflict for c near 0.5 and high 
conflict for c near 1). The outcome of the game depends on the strategies played 
which remain fixed during each individual’s lifetime. Certain combinations of 
initiating actions (first player) and responses (second player) are cooperative and 
others are non-cooperative. Players do not choose directly whether to cooperate 
or not, they just ‘choose’ (in evolutionary terms) which actions and responses 
to play. 

With a basic repertoire of two possible actions and two possible responses 
the number of different strategies is four. Of these, two may be called ‘self- 
cooperative’ in the sense that two players playing one such strategy will always 
cooperate independently of the roles they play. The other two are called ‘non- 
self-cooperative’ (‘non-cooperative’ for short). Both simple and extended game- 
theoretic analyses have been performed for the mixed-medium approach and 
all of them resulted either in oscillations in the composition of the population 
or in a combination of non-cooperative strategies as the evolutionarily stable 
outcome. In both cases the overall proportion of games resulting in coordination 
was equal to the baseline level that is expected to occur by chance (50%). Never 
was a single self-cooperative strategy a non-invadable solution. 

Similar results were also observed using a partial differential equations ap- 
proach in which players are spatially distributed and interactions occur within a 
finite local range (Di Paolo, 1999, submitted). However, in a spatially extended, 
individual-based model of the same game the outcome was different. In this 
model players are not treated as continuous densities of strategies but are rather 
implemented as discrete individuals and all interactions are subject to different 
kinds of noise. Strategies are encoded for each individual in a binary haploid 
genotype and an evolutionary algorithm is run It is observed, in contradic- 
tion with the game-theoretic results, that for values of c slightly higher than 



^ Other details: 2-D toroidal arena (100 x 100), square local neighborhood of size 
10 X 10, variable population size, overlapping generations, sexual reproduction, local 
selection of reproductive mates and local allocation of offspring. Offspring’s genotype 
is built by uniformly recombining the parental genotypes and adding point mutations 
(probability of no modification after a mutation event; 0.94), see (Di Paolo, 1999; 
submitted). 
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0.5 the population tends to be constituted by a single self-cooperative strategy 
and a small density of a competing non-cooperative one and, as c is increased, 
the global level of cooperation decreases in a more or less linear fashion until 
c = 0.65 where the baseline level is obtained (see Fig. 1). 




c 



Fig. 1. Average value for the level of cooperative coordination for different values of 
c. Each point is the average of 5 simulation runs. For each run the level of coordina- 
tion is averaged in time over the steady state. The line represents a linear regression, 
(correlation coefficient: -0.982, slope: -3.32, null hypothesis: no variation with c, P < 
0.005). Error bars indicate standard deviation 



In (Di Paolo, 1997), and later in more detail in (Di Paolo, 1999; submitted), 
an explanation is offered of this and other observations in terms of the effects 
introduced by the spatial organization of the population into discrete, separate 
and relatively stable clusters of players. The ecological forces that account for 
the stability of these clusters are also responsible for breaking many of the inbuilt 
symmetries between the roles in the game. Thus individuals at the center of a 
cluster play the second role more often than those at the periphery because of the 
better opportunities for being selected by other players as partners in the game. 
Moreover, other geometrical factors enter into play. For instance, the length of 
genealogies surviving without interruption tends to be longer if they start near 
the center of the cluster than if they start near the border so that all players 
tend to reflect the strategy played at the center. 

This situation combines with the fact that players are discrete entities and 
therefore for invasion of a mutant strategy to occur a minimal threshold density 
must be achieved so that the average extra gain of the invader rises above fluc- 
tuations and the new strain is able to assert itself (see for instance de la Torre 
& Holland, 1990; Tsimring, Levine & Kessler, 1996; Abramson & Zanette, 1998; 
Goodnight, 1992). The asymmetries introduced by the ecological organization 
makes the achievement of threshold density easier for self-cooperative strategies 
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if c is near 0.5 and the equilibrium density of non-cooperative strategies increases 
more or less linearly with c until their capability for invasion is regained. This 
is how the behavior shown in Fig. 1 is explained. Of course, it is possible to 
ask whether this complicated picture in which selective arguments are mixed 
with ecological dynamics is indeed necessary for explaining the results. Perhaps 
a simpler explanation in the form of a kin-selective argument would be enough 
and more economical. This is something that has to be tested. 

3 Kin Selection 

As mentioned above, it has often been found that cooperative interactions under 
circumstances of conflict can be stable if a spatial dimension is added to the evo- 
lutionary scenario even when simpler selective arguments predict the contrary. 
This has been demonstrated both by mathematical considerations and com- 
puter models in the case of the Prisoner’s Dilemma (Axelrod, 1984; Nakamaru 
et ai, 1997) and communication games (Ackley & Littman, 1994; Krakauer & 
Pagel, 1995; Oliphant, 1996). Depending on the particular features of the model 
there may be more than one possible explanation for this phenomenon. For in- 
stance, Axelrod argues that spatial clustering favors reciprocity in the case of 
the TIT-FOR-TAT strategy (Axelrod, 1984, p. 65 - 69). A different explanation 
is preferred by Ackley & Littman (1994) and by Oliphant (1996) for their respec- 
tive models. Such explanation involves the concept of kin selection (Hamilton, 
1964). It is argued that, since the spatial regions where mating partners are cho- 
sen from and offspring allocated into tend to coincide with areas where game 
co-participants are selected from, this ensures that players will be highly related 
so that a cooperative player will tend to increase, on average, the frequency of 
genes identical to its own in other players, i.e. its inclusive fitness. 

If fitness is related to the degree to which an organism can pass copies of 
its own genes to the next generation, inclusive fitness is a generalization of this 
concept so that it can also account for the presence of traits that help the 
transmission of copies of identical genes that happen to be located in other 
individuals. As the well known example goes, a behavior that is detrimental for 
an individual but which produces benefits of a same magnitude for two or more 
siblings will tend, under certain circumstances, to increase the inclusive fitness 
associated with that behavior and therefore it will be favored by selection. This 
is so because the probability of finding the same given gene in a sibling is, in 
haploid and diploid genetic systems, equal to 1/2. Thus, the general condition, 
known as Hamilton’s rule, for a trait or behavior in individual i to be selected is 
not that it has an individual positive fitness effect but that it has a positive 
inclusive fitness effect 

wLc = wL, + J2^ijWi^,>o ( 1 ) 

j 

where is the degree of relatedness between i and another individual j (defined 
carefully below) whose own individual fitness is affected by a quantity by 
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the actions of i. The sum is extended over all the other individuals who may 
be affected by the trait or behavior in question. Even in cases where < 0, 
inclusive fitness can still be positive as long as the remaining terms in the sum 
are large enough. 

Here it is clear why the restricted dispersal induced by space can play a rele- 
vant role. If interaction and reproduction are local processes, i.e. if co-participants 
as well as reproductive mates are selected from an individual’s vicinity and off- 
spring are allocated into the same vicinity, then the average value for the r^’s 
could be expected to be higher than in a mixed population. Therefore, cooper- 
ative behaviors at the expense of one individual but which benefit others in the 
local vicinity could at the same time tend to be beneficial to individuals that 
are highly related^; a situation which may result in > 0. 

Such is indeed the kind of argument advanced by Ackley & Littman (1994); 
Oliphant (1996) and others. Unfortunately, none of these authors actually shows 
that this is the case in their models. They just content themselves with finding 
a good candidate explanation without verifying if condition (1) is fulfilled, de- 
spite the fact that such a verification should be easy enough to perform in the 
computer models involved. 

In the model discussed in the previous section, kin selection arguments are 
not straightforward in the sense that the actions and responses of the players 
cannot be said to be intrinsically cooperative or non-cooperative; they depend 
on the context of the strategies used by the rest of the players. However, under 
the assumption of weak selection pressure it is possible to postulate a situation of 
quasi-equilibrium in which the context is fixed and then actions and responses 
could be seen as cooperative in themselves. This is by no means the general 
situation in this model but this assumption will be maintained in order to see 
that even if this situation were true, kin selection would not be enough to explain 
the obtained results. 

In order to test the plausibility of kin selection as a valid explanation for the 
evolution of cooperative coordination, a calculation is performed of the degree of 
relatedness between individuals and their average partners in the game. Genetic 
similarity can be caused by descent but also by other factors such as conver- 
gence, founder effects, etc. However, relatedness does not intend to measure just 
genetic similarity, otherwise all sorts of intra-specific conflicts and competition 
would be inexplicable. To take into account this subtlety relatedness is defined 
following (Grafen, 1991) as the degree of genetic similarity between two individ- 
uals over and above the average similarity within the population in which the 
individuals interact. In this way, if the difference in genetic constitution between 
two individuals is zero then their relatedness is equal to one, which means that 
from the point of view of gene frequencies for an individual to help the other 
is the same as to help itself. If the difference between the genetic constitution 
between the players is the same as the difference between one of them and the 
average genetic constitution in the population, then for that individual related- 



^ However, this need not be the case; see footnote 1. 
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ness is zero, since in cooperating with the other player, it is not contributing to 
an increase in the frequency of genes similar to its own. 

An estimation of relatedness in the current model is straightforward by tak- 
ing the perspective of the individual player and keeping track of the average 
relatedness with the partners it encounters throughout its lifetime. For each 
game that is played the Hamming distance dij between the binary genotypes of 
the participants is calculated as well as the distance between each genotype and 
the population average genetic constitution'^, and The relatedness of 
individual i to individual j is: 

~ 1 — ^vg ) ( 2 ) 

Notice that while dij = dji, in general rij 7 ^ rji since for each individual re- 
latedness is defined with respect to its own distance to the population average 
constitution® . This quantity is averaged for each individual during its lifetime in 
order to reflect its mean relatedness to its local partners. 

In the present model, if the average genetic constitution is calculated using 
the whole population instead of just the population which is bound to be affected 
by the actions of a given individual, relatedness will be overestimated. It must 
be remembered that clusters have a fairly independent evolutionary history. The 
degree of extra similarity that relatedness intends to measure should be in terms 
of the mean population with which the player and its offspring have a chance to 
interact. Whereas the average constitution used for the first measure involves the 
whole population, in a more accurate estimation the average genetic constitution 
used for the calculation of relatedness is taken as the one ‘seen’ by each player 
during its lifetime within its economic neighborhood (Queller, 1994). For each 
individual, the genetic constitution of all its partners in the game is averaged. 
The first estimation or upper bound (averaging over the whole population) is 
kept only for the purpose of comparison since it is assumed that the measure of 
interest lies much closer to the second estimation. 

Figure 2 shows the variations in the two estimations of relatedness for dif- 
ferent values of c, each point obtained by averaging the temporal mean after 
transients of 5 simulation runs with identical parameters. It is observed that 

The average genetic constitution is simply a string calculated as the sum of all the 
genotypes in the chosen sub-population and divided by the number of players. Since 
there are only two possible alleles per locus which are represented by O’s and I’s, the 
result of this calculation is that every component in the average string gives an indi- 
cation of the proportion of each allele in that particular locus. In order to calculate 
the distance from an individual genotype to this real-valued string, a measure is used 
which reduces to the Hamming distance when applied between two binary strings. 
This measure involves summing, for each component of the string, the absolute dis- 
tance between the individual allele and the corresponding population average. For 
instance, the distance between the string (1,0, 1,0) and the string (0.7, 0.3, 0.9, 0.9) is 
d = 0.3 + 0.3 + 0.1 -H 0.9 = 1.6. 

® This estimation coincides with the one proposed in (Queller & Goodnight, 1989) 
using each allele as a neutral genetic marker. 
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Fig. 2. Average estimations of relatedness vs. c. Lines depict linear regressions. Aver- 
ages over 5 runs and over the steady state. Error bars indicate standard deviation 



neither of the measures varies much with c. This behavior can be understood 
considering that relatedness is given mainly by the spatial relations between 
players of different generations which should not be affected too much by c®. 

By calculating the corresponding inclusive fitness per game per unit of energy 
it is found that for a first player (i) the payoff for no cooperation is Wj^’^ = c, 
and the payoff for cooperation is Wflll = (1 + Using condition (1) and 

taking the best estimation of relatedness (r = 0.11) it can be concluded that 
cooperation should be the favored outcome only if c < 0.55, which is not enough 
to explain the results. 

However, if the estimation of relatedness were as high as the upper bound 
(r = 0.45) then cooperation would be stable for c < 0.72. So it seems that if a 
generous estimation of relatedness could be justified, then kin selection would 
suffice for explaining the evolution of cooperative coordination in the present 
model. However, the expected result, if such were the case, would be a constant 
high level of cooperation for any value of c between 0.5 and 0.72, and a step-wise 
change to no cooperation for c > 0.72 which is not what is observed. As seen in 
Fig. 1, the level of cooperative coordination is a linearly decreasing function of c, 
even in ranges where relatedness does not vary. The condition for the stability of 
cooperation is given by establishing the sign of the inequality between the inclu- 
sive fitness for cooperation and for no cooperation. There are two parameters in 
this condition: relatedness and c. A linear variation of the degree of cooperation 
with c could only be explained by some spatial variation in relatedness resulting 

® The high value of relatedness for the upper bound reflects the fact that intra-cluster 
similarity is much greater than inter-cluster similarity. 





512 



in a linear decrease of the proportion of players for whom cooperation implies 
the best increase in inclusive fitness. Not only is this variation not observed, but 
it would also have to be manifested as a decrease in the global average of re- 
latedness. This observation suggests that, whatever the mechanism responsible 
for the stabilization of cooperation, it must not work equally well for all values 
of c below 0.72 as kin selection would if relatedness were to be estimated by its 
upper bound. 

Consequently, even after many concessions, the conditions for explaining the 
results in terms of kin selection are not met. 

4 Conclusions 

While kin selection remains an important concept for studying the evolution 
of social behavior, its use is far from being universal or automatic. The theory 
provides very simple rules for verifying its conditions of applicability and those 
rules should be used. This paper demonstrates how to do this for an individual- 
based model instantiated in a computer simulation. Additionally, it shows that 
the conditions for kin selection fail to be met in this model thus strengthening 
an alternative explanation in terms of ecological and selective factors. 

Independently of this last result, the main point of the paper should be taken 
as a methodological one. The behavior of computer simulations can be quite hard 
to understand in itself. There is no reason why anything should be obvious in 
view of the complexity of contemporary models. Fortunately, the strength of 
simulations is that any hypothesis regarding what is going on in them can be 
put to the test with relative ease. To fail to do this is to fail to take advantage 
of one of the crucial aspects of simulations. In science, a hypothesis or an expla- 
nation should be tested against all the available evidence and there is no reason 
to expect a lesser standard from Artificial Life. 

This work was done before finishing my PhD at the School of Cognitive and Com- 
puting Sciences, University of Sussex. Many thanks to Phil Husbands, Inman Harvey 
and John Stewart for their comments. Thanks also to Seth Bullock, Jason Noble and 
Matt Quinn for lively discussions. 
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Abstract. We investigate simple mechanisms for social learning in an 
evolutionary simulation of food-preference copying in Norway rats. These 
animals learn preferences by interacting with conspecifics, but, unexpect- 
edly, they fail to learn aversions after interacting with a poisoned demon- 
strator. They also follow each other to food sites. Simulation results show 
that failure to discriminate between sick and healthy demonstrators may 
be due to details of food toxicity in foraging environments. A seemingly 
complex instance of social information transmission is explained through 
the action of simple behaviours in an appropriately structured environ- 
ment. 



1 Introduction 

Animal behaviour can be seen as the problem of what to do next. Natural se- 
lection has shaped the behavioural strategies of the animals we see today, but, 
clearly, it has arrived at different solutions in different species. For an animal fac- 
ing a particular environmental challenge, three broad sources of strategy can be 
distinguished: instinct, learning, and social learning. For example, suppose that 
a foraging animal has to decide whether or not to eat a piece of toxic, unripe 
fruit that it has found. The decision might be made instinctively; the animal 
has an inherited tendency to avoid fruit of that colour and texture. Alterna- 
tively, the animal might have learned through bitter experience that such fruit 
is unpalatable. Finally, it could have learned socially: perhaps it has observed 
conspecifics rejecting this kind of fruit, or has seen conspecifics become ill after 
eating it. In this paper we focus on the third strategy source, social learning, to 
explore the way in which simple specific mechanisms of social information gath- 
ering can interact with structured environments to yield unexpected behavioural 
implications. 

In recent years there has been some progress towards understanding the adap- 
tive function of social learning. Models of cultural [1] and “highly horizontal” 
transmission [2] help to delineate the conditions under which it will be advan- 
tageous for individuals to learn from others rather than finding things out for 
themselves. But we are also interested in questions of mechanism: how exactly 
does an animal gain information from the behaviour of its conspecifics? A variety 
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of candidate mechanisms have been proposed [3], and complex possibilities such 
as intentional imitation have received a great deal of attention. Nevertheless, 
seemingly complex behaviour may be produced by simpler mechanisms. For ex- 
ample, Galef [3] discusses “stimulus enhancement” in which a tendency on the 
part of naive animals to approach conspecifics leads to their being more likely to 
encounter one set of stimuli rather than another, and thus shapes their (individ- 
ual) learning experience. The power of such simpler evolutionary solutions has 
probably been underestimated. Several phenomena that were once seen as in- 
volving imitation, such as the opening of milk bottles by birds and the washing of 
food by monkeys, have since been questioned [3], and parsimonious explanations 
have been offered in terms of processes like stimulus enhancement. However, it is 
difficult to design experiments that conclusively expose the mechanism at work 
in particular cases. 

We believe that the individual-based simulations characteristic of artificial 
life can be useful tools in the investigation of social learning, much as they have 
been for studying the evolution of individual learning. It has long been recog- 
nized within artificial life that complex global phenomena can arise from simple 
local rules, and this is precisely what some researchers suspect is happening in 
animal social learning: individuals follow a simple rule (e.g., “stay close to your 
mother”) and, in combination with associative learning, the overall pattern of 
behaviour that arises makes human observers suspect imitation. Although work 
in artificial life has certainly considered social dynamics in contexts such as for- 
aging, communication, and flocking or schooling, there has been relatively little 
work on the specific topic of social learning. The model most relevant to our 
own work is by Toquenaga et al. [4], who constructed a simulation of foraging 
behaviour in egrets. The authors demonstrate that stimulus enhancement is im- 
portant in the evolution of flock foraging and colonial roosting, and is more likely 
to evolve when resources are patchy rather than evenly distributed. 

We look at social learning in Norway rats {Rattus norvegicus) - an oppor- 
tunistic, central place foraging species - to see how their specific and rather 
surprising social learning mechanisms may have evolved in response to environ- 
mental features. These rats employ at least two simple mechanisms that allow 
them to share information about food [5,6]. Firstly, they have a robust tendency 
to copy the feeding preferences of their conspecifics. A rat will develop a marked 
preference for a novel food that it smells on the breath of another, and the ef- 
fect is strong enough to make a rat choose the novel food type over its normal 
diet, despite the fact that rats usually avoid new foods. The key stimulus is the 
detection of the novel food odor in combination with carbon disulfide, a com- 
ponent of rat breath. Rats will not, for instance, develop a preference for foods 
that an experimenter has wiped onto the fur of another rat. Secondly, rats will 
spontaneously follow conspecifics on foraging trips out of the nest; this habit is 
especially pronounced in younger animals. Such behaviour clearly suggests that 
stimulus enhancement may be occurring. One of the ways a rat could come to 
exploit a new food source would be simply by following another and learning 
from the experience. 
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Galef et al. [7] uncovered an apparent paradox in rats’ social learning. They 
assumed that if rats could acquire food preferences through social interaction, 
they would probably also be able to learn an aversion to a novel, toxic food by 
smelling it on the breath of a demonstrator rat and simultaneously noting that 
the demonstrator was ill. Experiments revealed, however, that this is not the 
case. Rats are not sensitive to a demonstrator’s state of health, and in fact only 
ever develop a preference for the novel food. 

This surprising finding was the starting point for our own investigations. The 
value of distributed intelligence through copying others’ food choices seems clear 
in an opportunistic forager that must deal with new and potentially toxic foods, 
especially when seen alongside the fact that rats will normally avoid novel foods. 
But why don’t they discriminate between sick and healthy demonstrators? It is 
not because they can’t - rats perform well on a wide variety of discrimination 
tasks, and are capable of identifying sick conspecifics using odor and behavioural 
cues [8]. Curiously, other species such as blackbirds [9] and chickens [10] do 
manage to learn both preferences and aversions through observation. 

We suspected that the answer might depend on characteristics of the rats’ 
foraging environment; specifically on the probability that eating a toxic food 
would result in the death of the animal. To test this suspicion we constructed 
an evolutionary simulation, within which we systematically varied the lethality 
of toxic foods in the environment, and observed the effect on the evolution of 
a gene for discriminating between sick and healthy demonstrators. We were 
also interested in following behaviour, and extended the initial model to include 
this possibility. In particular, we wanted to demonstrate that these two simple 
mechanisms - copying and following - could together account for apparently 
complex social learning. We were also interested in determining whether there 
was any interaction between the two: are the benefits of copying food preferences 
and those of following others independent, or do they interfere with each other 
in some way? 



2 The Evolution of Preference Copying 

2.1 Modelling Learning Rats 

In our simulation an initial population of 100 rats foraged from five centrally 
located nests. Rats foraged by night, with each night divided into five foraging 
periods. During each period, a rat could visit one of 25 foraging sites: if it found 
food at that site, it had to decide whether or not to eat it. If a rat chose to eat, 
and if there was sufficient food, it would fill its stomach and return to the nest 
- the night’s foraging was over. However, rats that rejected the food they had 
found, and rats that ate only a partial meal due to competition, would continue 
to forage until they had eaten their fill or until the night was over. If a rat 
consumed nutritious food it gained energy, but 10% of the food types in the 
environment were toxic. If a rat consumed a toxic food there was a parameter 
governing the probability that it would die at once; otherwise it would lose some 
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energy and would show signs of poisoning when it returned to the nest. Rats 
that consumed no food at all would eventually die of starvation. 

The rats were given a simple memory: any food they encountered would 
either be novel, familiar or aversive. For newborns, all foods were novel. The 
first variable under evolutionary control was the probability E that a rat would 
eat a novel food. After eating a new food, the rat would remember it as either 
familiar or aversive depending on whether or not it was toxic. Rats would always 
eat familiar foods, and always reject aversive foods. Rats were also assumed to 
be capable of remembering where they had foraged last night, and a binary gene 
P controlled whether or not they would persist and return to their last successful 
feeding site, if any, during the first period of the next night. 

Each individual had two more genes controlling its preference-copying be- 
haviour: a binary gene S indicating whether the rat would learn about new 
foods by smelling the breath of other rats, and a binary gene D that controlled 
whether the artificial rat (unlike real rats) would discriminate with regard to 
the state of health of another rat whose breath it smelled, avoiding foods eaten 
by sick rats and preferring foods on the breath of healthy rats. Upon returning 
to the nest each morning, rats could potentially smell the breath of a randomly 
chosen nestmate as a way of gaining information about new foods in the envi- 
ronment. If a rat had only the smell-based learning ability, 5, it would simply 
smell the breath of a nestmate and become familiar with the type of food the 
nestmate had eaten that day, if any. But the rat might thereby develop famil- 
iarity with and hence preference for a food that was in fact toxic: some rats 
died immediately after eating a poisonous food, but others made it back to the 
nest and were ill. Only if the learning rat also had the ability to discriminate, 
D, would it develop preference or aversion for the new food type depending on 
whether the nestmate was showing signs of poisoning. (A low level of error could 
also be associated with this discrimination ability.) 

When a rat had accumulated a certain amount of energy, it would undergo 
simplified asexual reproduction. The carrying capacity of the environment was 
fixed at 100. If a newborn individual could not take the place of a rat that had 
recently died of poisoning or starvation, the current oldest rat would be selected 
for death by old age in order to make room. Newborn rats inherited (with a 
small chance of mutation) the four-element behavioural strategy of their parent 
described above, plus a level of starting energy. 

Parcels of food appeared in random sites in the environment at a constant 
rate. In order to ensure that the rats always had to deal with novelty, a total 
of 100 food types were used, but with a “window” of 10 food types that could 
appear at any one time. Every 16 days the window would advance, so that one 
old food type stopped appearing, and a novel one entered the scene. The lifespan 
of an individual rat (which was an emergent property of the simulation) never 
grew long enough for it to experience all 100 food types. 
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Whilst we have tried to make this model reflect part of the lives of real rats, 
the parameters used in the simulation*^ are not as closely matched to real data 
as we would like. There is a great deal of information available on the behaviour 
of rats in the laboratory, but data on the ecology of wild rats is not extensive 
[5]. We have therefore pitched the simulation at a relatively abstract level, and 
in so doing we hope to have captured some aspects of the selection pressures 
impinging on social foragers in general. 



2.2 Results 

In accordance with our hypothesis, we varied the probability of death due to 
eating a toxic food. For each level of lethality investigated, ten evolutionary 
runs, each 200,000 days in length, were performed. The statistics reported below 
describe the state of populations at the end of these runs. 

When the lethality value was zero, i.e., in a benign environment, there was 
no selection pressure on the rats to smell each other or to discriminate: Fig. 1(a) 
shows that gene frequencies for S and D remained close to 50%, the value ex- 
pected by chance. Nor were the rats particularly persistent, as we will see later 
in Fig. 3(b). At the same time, the mean probability for eating novel foods (E) 
was high at 89.8% (Fig. 1(b)). The mean lifetime of a rat was 268 days, during 
which time approximately 17 new foods would have made an appearance. The 
rats were, on average, familiar with 13.3 food types, and had aversions to 2.1 
foods: these frequencies are not too far from the 10% base rate of toxic foods 
in the environment. Thus, when poison results only in a stomach ache and an 
aversion to trying that food again in future, simulated rats are open to trying 
new foods, and pay no special attention to the eating habits of others. 

It is clear from Fig. 1(a) that as the lethality level increases there is increasing 
selection pressure for learning from others. Rats become more likely to smell 
the breath of conspecifics, and to discriminate between the sick and the healthy 
when doing so. At the same time, they are still willing to try new foods that they 
come across - Fig. 1(b). However, when the lethality level reaches approximately 
0.3, there is something like a phase transition in the results. The probability of 
eating novel foods drops dramatically; the overall mean probability for eating a 
novel food, across all runs with lethality greater than 0.3, was less than 1%. As 
lethality approaches 1.0, there is a uniform strong selection pressure on the gene 
for smelling the breath of others. Importantly, though, the selection pressure on 
discriminating decreases: as poison became more and more dangerous, it was 
no longer important to pay attention to the health of a demonstrator - we will 
argue in section 4 that this result explains the failure of real rats to discriminate. 
Aggregating the results for lethality levels greater than 0.3, the rats were familiar 

* Rat stomach capacity = 10 food units; cost of living = 1 unit per day; default error 
level in discrimination = 0.01; mutation rate = 0.05, standard deviation for mutation 
of real-numbered genes = 0.1; food parcel mean size = 100 units, standard deviation 
= 40, food parcel input rate = 5 per day; reproduction level = 1000 units, cost of 
offspring = 500 units, offspring starting energy = 400 units. 
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(a) 



(b) 



Fig. 1. (a) Mean frequency of the smelling and discrimination genes S and D by 
lethality level - solid and dashed lines respectively. In this and subsequent figures, 
error bars show the standeurd error across ten simulation runs, (b) Mean value of the 
real-valued gene E, governing the probability of eating novel food, by lethality level. 



with about the same number of foods (10.0), but they developed almost no 
aversions (0.2). There were other effects: because the rats were more cautious in 
their treatment of novel foods, they generally ate less, which led to increases in 
the inter-birth interval and the average lifespan. 




Fig. 2. Mean frequency of the discrimination gene D given zero (solid line) and 5% 
(dashed line) error rates in discriminative ability. 



The finding that selection for discrimination falls off with increasing lethality 
could have something to do with the error rate of the rats’ discriminative ability, 
normally set to 1%. We therefore considered the evolution of the gene D under 
zero-error and 5% error conditions. Fig. 2 shows that a higher error rate indeed 
leads to selection against discriminating. But even with zero error, the frequency 
of the gene D is close to the expected value for random drift (i.e., 50%), given high 
levels of lethality. Even if one’s discrimination ability were perfect, it appears 
that it would not be a selective advantage in an environment where eating a 
poisonous food would kill you four times out of five. 
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3 Adding Following to the Model 

3.1 Changes in the Model 

In order to make following behaviour possible, the rats were given a fifth gene, 
F, that determined the probability that they would follow a random conspecific 
when leaving the nest for the first foraging period. Following was only ever 
selected if the rat was not being persistent. In other words, the decision tree for 
the rats was as follows. 

1. Check persistence gene, P; if set, was I successful last night? If so, return. 

2. Check percentage chance to follow, P; should I follow someone else? 

3. If neither of the above selected, choose a random site. 

Rats that followed simply went to the same site as a randomly chosen conspecific 
who was not also following, but there was no way for a rat to tell whether it 
was following a persistent conspecific or one that had merely selected a site at 
random. (If all the rats chose to follow, they were all sent to random sites.) The 
extended model was in other respects the same as the basic model. 



3.2 Results 

The evolution of the genes for smelling, discrimination and eating novel foods 
were not much affected by the possibility that rats could follow each other to 
food sites; the results were qualitatively similar to those shown in Fig. 1. The 
only difference was that the transition between liberal and conservative attitudes 
to new foods occurred at a lower level of lethality, approximately 0.15. 

The results for the evolution of the following gene are shown in Fig. 3(a). 
We were somewhat surprised to find that following other rats was not strongly 
selected for (solid line) . There is a modest positive relationship between the mean 
value of F and lethality, but for the more benign environments the data could just 
as well be the result of a random walk. To find circumstances that could select 
for following, we ran a variant of the simulation in which food was less uniformly 
distributed. Following others to food is a kind of stimulus enhancement, which 
is more likely given patchy food distributions [4]. Instead of five food parcels 
with a mean size of 100 units arriving per day, one food parcel with a mean size 
of 500 units was supplied; Fig. 3(a) shows that under these conditions following 
behaviour was indeed more strongly selected for (dashed line) . 

Another way of looking at following behaviour is to consider its effects on 
the gene for persistence. Fig. 3(b) shows the mean frequencies for the gene P 
for the initial model and for the model with following added. In the initial case 
(solid line), we can see that persistence becomes more important with increasing 
levels of lethality. In a dangerous world where you don’t want to try anything 
new, it pays to go back to the site of yesterday’s successful feeding. However, 
when following behaviour is possible (dashed line), increasing levels of lethality 
no longer produce selection pressure in favour of persistence. 
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Fig. 3. (a) Mean value of the real-valued gene F, governing the probability of following 
other rats to a foraging site. The solid line shows the standard condition, while the 
dashed line shows the effect of a less uniform food distribution (see text), (b) Compar- 
ison of the frequencies for the persistence gene P with (dashed line) and without (solid 
line) the possibility of following behaviour. 



4 Implications for Real Rats 

The failure of real Norway rats to discriminate between sick and healthy demon- 
strators may be due to the details of food toxicity in their foraging environments. 
In the initial model, when the lethality value crosses the threshold of about 0.3 
the whole pattern of behaviour changes. The simulated rats become extremely 
wary of new foods, but attend closely to what their conspecifics are eating. How- 
ever, because they “know” that their nestmates are just as conservative as they 
are about trying new and potentially dangerous foods, the need to discrimi- 
nate between sick and healthy demonstrators is reduced. Observing a poisoned 
demonstrator would doubtless provide useful information, but it becomes such 
a rare event that there is little selection pressure for paying attention to it. 

The possibility of error in discrimination just makes things worse. In the 
terminology of signal detection theory, a hit would be correctly identifying a 
food as poisonous after observing a sick demonstrator who has eaten that food. 
A miss would be failing to identify a food as poisonous under these circumstances. 
Misses would certainly be costly, but false alarms may at times be even more 
so: given that one has a low likelihood of eating new foods, and that there is 
a limited number of foods present in the environment, believing falsely that a 
palatable food is poisonous could deprive a rat of a much-needed food source. 
As the error rate for discrimination increases, there must come a point when 
it is better to simply accept all foods detected on the breath of others, thus 
risking occasional poisoning but ensuring that no palatable food source ever 
goes unexploited. It is not clear what the error rates for discrimination would be 
in real Rattus norvegicus, but if we recognize that animals can be ill for reasons 
other than food poisoning, and that they may conceal illness when it exists, the 
levels of 1% and 5% seem conservative. 
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Furthermore, discrimination in the real world would not come for free. The 
animal would have to pay the time and energy costs of developing a sensory 
system and decision mechanism that allowed it to detect sickness in others and 
then act accordingly; the blind acceptance of any food odor that is smelled in 
conjunction with carbon disulfide is clearly a simpler way to do things. In the 
simulation we did not attempt to model this fact, i.e., we did not include any 
direct costs on the ability to discriminate. The existence of such direct costs for 
real rats would make discrimination even less likely to evolve. 

There is a clear empirical prediction arising from the initial model. Firstly, if 
we consider foraging populations with very low levels of danger associated with 
eating toxic food, animals in these populations should be content to eat novel 
food and ignore the experiences of others. Next, as lethality increases, individuals 
will become more likely to pay attention to the eating habits of others, and to 
discriminate as to the state of health of demonstrators. Finally, above a certain 
threshold of lethality we expect to find animals with a very low likelihood of 
eating novel foods, a great interest in what others are eating, and no strong 
tendency to discriminate between sick and healthy demonstrators. Norway rats 
appear to fit this last profile. Blackbirds [9] and chickens [10] may fit the second. 

The results for the extended simulation with the possibility of following have 
less clear implications. In increasingly lethal environments, following others to 
food sources is moderately favoured. When the distribution of food is patchy, 
there is reasonably strong selection pressure for following behaviour; this is in 
accordance with an earlier finding that patchy food distributions will promote 
stimulus enhancement behaviours [4]. But the interaction between following and 
persistence shown in Fig. 3(b) demands further analysis. The problem is that 
mean values for the probability of following in the non-patchy environment are 
close to 50%; this is exactly what would be expected if there was no selection 
pressure, and suggests that following is not particularly adaptive. Nevertheless, 
the observed means imply that about half the time rats will be following a 
conspecific rather than choosing a random food site, assuming they are not being 
persistent and returning to the scene of yesterday’s success. Comparing the initial 
model to the model with following added, we find that the gene for persistence 
is selected for in the former case, but not in the latter. The apparently chance 
levels of following behaviour are affecting the evolution of persistence; one could 
even argue that following behaviour takes the place of persistence. It is therefore 
not so clear that following behaviour is adaptively neutral. 

Presumably there is a trade-off between following others to food but having 
to share it when you get there, and thus taking a chance that you will get less 
than a full meal, versus choosing a food site randomly and having it to yourself. 
(However, there were always more rats than food sites; on average a rat would 
be sharing with four others in any case.) This may also have something to do 
with the results for persistence: if some fraction of the population is going to 
follow you to your current favourite food site, the benefits of persistence may be 
swamped by the costs of attracting a large and hungry crowd. Clarification of 
these issues must wait for future work. 
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5 Conclusions 

We see foraging behaviour in rats as a paradigm case in which a seemingly com- 
plex instance of social (or even cultural) information transmission can be ex- 
plained through the action of simple behaviours in an appropriately structured 
environment. In fact, the behaviour underlying rat food preference copying is 
even simpler than was originally expected by those looking for simple mecha- 
nisms. That is, rats pay no attention to whether the individual they are learning 
a preference from is suffering from food poisoning. Our model has added to this 
picture by showing that this strategy may well have evolved just because eating 
poison lessens the likelihood that a rat will survive to influence any of its con- 
specifics. Similarly, when following behaviour is possible and occurs at arbitrary 
levels in the population, rats do not even need to remember the site where they 
found food the night before. 
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Abstract. Since male primates are bigger and stronger than females, they are by 
default considered dominant, When in a cohesively grouping ape (but not in its 
loosely grouping relative), females often appear dominant to males, the static 
image of female weakness is maintained and female dominance is attributed to 
high, species-specific co-operation among several females against single males. 

In this paper, an individual-oriented model is used to produce a parsimonious 
alternative: female dominance over males may directly vary with group- 
cohesiveness without species-specific differences in co-operative tendencies 
among females. The model consists of a homogeneous world in which entities 
roam. Entities are so constructed as to have merely a tendency to group and 
perform dominance interactions. ‘Male’ entities (StrongTypes) are 
characterised by a higher initial dominance value and intensity of attack than 
‘female’ entities (called WeakTypes). Dominance values change and evolve 
due to the self-reinforcing effects of winning and losing contests. In the model, 
more rank-overlap between both types arises from a stronger feedback between 
dominance and spatial structure in cohesive than in looser groupings. Biological 
implications of these phenomena and testable hypotheses for real animals are 
discussed. 



1 Introduction 

High dominance rank is supposed to be associated with many benefits and 
accordingly it is thought to be of central importance in primate social behavior (1). 
Although many anthropoid primate species live in permanently bi-sexual groups, 
hardly any data on male-female dominance relations exist (2). Instead, dominance 
relations are usually studied within each sex separately, and females, being often 
smaller and of inferior fighting capacity, are considered to be subordinate to males by 
default. Although case histories of females being dominant to all or some males in 
some groups of certain species are known (3), they are usually disregarded as 
abnormal. For those few species in which it is consistently reported that adolescent or 
adult males have difficulties in dominating females, an additional assumption is made 
that females of these species have a stronger than usual tendency to form large 
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coalitions against single males (4,5). In this way, the generally accepted image of 
weak individual females is maintained because female dominance is attributed to 
collective force of larger numbers. 

Although it is common to attribute social characteristics to internal qualities, and 
the observation of different social characteristics at a group level, automatically leads 
to the search for a different individual behaviour, an alternative, more integrative 
approach is possible which we will follow here. In this approach, social 
characteristics are studied and explained within the context of typicalities of the 
species. For instance, Deneubourg and co-authors (6) described a very different 
swarming pattern for each of three ant species. Instead of attributing this to internal 
differences between the species, they showed, by means of a simulation, that under 
the different feeding conditions that are characteristic of the three species, one and the 
same set of rules leads to the three characteristic swarming patterns. Thus, the 
swarming patterns result from the interactions between the entities and their 
environment. 

The same, contextual approach will be followed in this paper as regards the 
difference in social behaviour in two related chimpanzee species, the common and 
pygmy chimpanzee. Female dominance over males manifestly occurs in the pygmy 
chimpanzee, male dominance is current in the common chimpanzee (7). Following 
the traditional scenario, female dominance in the pygmys is attributed to the 
formation of coalitions against males but such coalitions are described also for 
common chimpanzees, for instance by de Waal (8). Since both species differ 
significantly in cohesiveness of grouping (the pygmy chimpanzees group more 
cohesively (7)), we will focus here on explaining the described contrast in female 
dominance as a side-effect of the difference in cohesiveness per se, without assuming 
a higher tendency among females to form coalitions against males. 

In our approach we discard the fixed image of dominance, because cherishing it 
implies the assumption that there is a strong heritable component in dominance rank. 
This view is supported by some (9), but it is contradicted by the failure to replicate 
former dominance relations in experiments in which individuals are reshuffled 
between groups (10). 

Instead of being strongly inherited, others, such as Rowell (11) and Chase (12) 
consider dominance to be due to chance and the self-reinforcing effects of winning 
and losing. Whereas the winner of the first encounter is decided purely by chance, 
after winning or losing once, the effects of winning and losing are self-reinforcing. 
For instance. Chase has shown by means of experiments that after losing, a monkey is 
more likely to lose again even when it encounters a much smaller opponent whom it 
would have defeated easily under other circumstances. This points to a strong psycho- 
physiological, experience-based component in dominance rank. 

Furthermore, how dominance may be subject to self-organisation, because 
characteristics of dominance hierarchy and spatial structure (specifically the central 
location of dominants) are mutually reinforcing has been shown by means of an 
individual-oriented model (13,14). 

Taking the last-mentioned approach, we are now going to study the effects of 
cohesiveness on rank-overlap between both sexes by means of an individual-oriented 
model. Since primate species are described to differ in their intensity of attack (15), 
we study virtual entitles of different species-specific intensities of attack. The present 
model deals with dominance interactions among grouping entities and was originally 
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inspired by a model designed by Hogeweg (13). It consists of a homogeneous world 
with entities that are merely aggregating and that, upon meeting each other, perform 
dominance interactions, if risks are low (14,16-18). The superior fighting capacity of 
males compared to females is represented by a higher intensity of aggression 
(following findings on primates by Bernstein 19) and by a higher initial rank value in 
so-called ‘StrongType’ entities, higher than in ‘female’ inspired entities called 
‘WeakTypes’. Within each type of entity, individuals are completely identical at the 
beginning of each run. The outcome of the model will be used to produce testable 
hypotheses for real primates. 



2 Methods 

In this section, a description of the model and behavioural measures is given. 



2.1 The Model 

The model is individual-oriented and event-driven (see 20). It is written in object- 
Pascal, Borland Pascal 7.0 and consists of a ‘world’ (toroid) with its interacting 
agents, its visualisation and special entities that collect and analyse data on what 
happens in the ‘world’ (cf. the ‘recorders’ and ‘reporters’ of Hogeweg, 1988). The 
‘world’ consists of a space of 200 by 200 units. At the start of each run entities 
occupy random locations within a predefined subspace of 30 by 30 units. Since the 
space of the world is continuous, agents are able to move in any direction. They have 
an angle of vision of 120 degrees and their maximum perception distance (MaxView) 
is 50 units. Activities of agents are regulated by a timing regime, as follows. Each 
entity draws a random waiting time from a uniform distribution. The entity with the 
shortest waiting time is activated first. The lapse of waiting time is usually the same 
for all entities, but if a dominance interaction occurs within NearView of an agent, its 
waiting time is reduced, thus increasing the probability that the agent will be 
activated. Agents group and perform dominance interactions according to a set of 
rules described below (figure 1). 



2.1.1 Grouping rules 

Usually, two opposing forces affecting group structure are postulated; on the one hand 
animals are attracted to one another because participation in a group provides safety; 
on the other, aggregation implies competition for resources, and this drives 
individuals apart (e.g., 21). 

The forces leading to aggregation and spacing are realised in the model by a set of 
rules that are graphically displayed in Figure 1 (see 14). 
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Fig. 1. Flow chart for the behavioural rules of the entities 



2.1.2. Dominance interactions 

Dominance interactions are competitive interactions over resources that are not 

specified in this model, but are presumed to include, food, mates and spatial location. 

Competitive interactions are only initiated if the perceived risks of defeat are low (i.e. 

so-called risk sensitive system (14). Interactions between agents are modelled after 

Hogeweg (22) and Hemelrijk (14), as follows: 

- Each entity has a variable DOM (representing the capacity to win a hierarchical 
interaction). 

- After meeting one another in their PerSpace, entities ‘decide’ whether or not to 
attack following the Risk-Sensitive system. Here, the probability to attack 
decreases according to the potential risk of defeat as follows. Upon meeting 
another agent and observing its DOM-value, an entity may predict it will win or 
lose on the basis of a ‘mental’ battle, which follows the rules of a dominance 
interaction as described below. If ego loses the mental interaction, it will refrain 
from action (thus displaying ‘non-aggressive’ proximity). If it wins the mental 
battle, it will start a ‘real’ dominance interaction. 

- If an actual dominance interaction takes place, then entities display and observe 
each other’s DOM. Subsequent winning and losing is determined by chance and 
values of DOM as follows : 

DOM. 

1 > RND(0,1) 

Wj = DOMj -F DOM j (1) 

0 else 

Here w. is the outcome of a dominance interaction initiated by agent i (l=winning, 
0=losing). In other words, if the relative dominance value of the interacting agents 
is larger than a random number (drawn from a uniform distribution), then agent i 
wins, else it loses. Thus, the probability of winning is larger for whoever is higher 
in rank, and this is proportional to the relative DOM-value with its partner. 
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- Updating of the dominance values is done by increasing the dominance value of 
the winner and decreasing that of the loser: 

f \ 

DCMj 

W; -- 



DOMj: = DOMj + 



DOMj:=DOMj 



DCMj+DOVlj^ 

DOM: ^ 



W; 



DOM- +DCM. 



'STEFDOM 



*STEPDOV[ 



( 2 ) 



J J 



The consequence of this system is that it functions as a damped positive feedback: 
a victory of the higher ranking agent reinforces their relative DOM-values only 
slightly, whereas success of the lower ranking agent gives rise to a relatively large 
change in DOM. (To keep DOM-values positive, their minimum value is, 
arbitrarily, put at 0.01.) The change in Dom-values is multiplied by STEPDOM, 
i.e. as a scaling factor that varies between 0 and 1 and represents intensity of 
aggression. High values imply a large change in DOM-value when updating it, 
and thus indicate that single interactions may strongly influence the future 
outcome of conflicts. Conversely, low STEPDOM-values represent low impact. 

- Winning includes chasing the opponent over one unit distance and then turning 
randomly 45 degrees to right or left in order to reduce the chance of repeated 
interactions between the same opponents. The loser responds by fleeing under a 
small random angle over a predefined FleeingDistance. 

From now on, the initiation of a dominance interaction is for short referred to as 
‘attack’. 



2.2 Experimental set-up and Data collection 

Here, the same parameter setting (n=8, persSpace=2, nearView=24, 
FleeingDistance=2 units) is used as in a former study (23). 

The present study is confined to a population size of ten entities consisting of two 
types that differ in fighting capacity. Reflecting the physiologically superior fighting 
abilities of males (e.g. muscle structure) compared to females, StrongTypes start with 
a higher winning tendency than WeakTypes (i.e. of 20 versus 10) and display a 
higher intensity of aggression. I have experimented with three different StepDom 
values (i.e. 1.0, 0.5 and 0.25 for StrongTypes). StepDom values for WeakTypes were 
always 80% of those of StrongTypes (respectively 0.8, 0.4, 0.20). 

Varying the SearchAngle (i.e. 45, 90 and 180 degrees) produces three degrees of 
cohesiveness of grouping. For each combination of Stepdom and Searchangle, 5 runs 
are conducted, resulting in a total of 5 * 3 * 3 = 45 runs. 

During a run, every change in spatial position and in heading direction of each 
entity is recorded. After every time step (consisting of 160 activation), the distance 
between agents is calculated. Dominance interactions are continuously monitored by 
recording: 1) the identity of the attacker and its opponent; 2) the winner/loser; 3) the 
updated DOM-values of the entities. 
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2.3 Measurements 

At intervals of two time steps (320 activation), the degree of rank differentiation and 
the overlap between the dominance hierarchies of StrongTypes and WeakTypes are 
measured as follows. 

Rank differentiation is measured by the coefficient of variation (standard deviation 
divided by the mean) of Dom-values (24). For each run the average value is 
calculated. Higher values indicate larger rank distances among entities. 

At the start of each run, all StrongTypes are dominant over each WeakTypes, but 
during run-time some WeakTypes may become dominant over (some or all) 
StrongTypes. The degree of dominance of WeakTypes over StrongTypes is estimated 
by the Mann Whitney U- statistic (25). At the beginning of the run U- values are zero. 
Later on they may become positive. 

Bidirectionality of attack is calculated as a X|^,-correlation between an actor and 
receiver matrix of attack (26). This statistic measures the correlation between the 
corresponding rows of two social interaction matrices. The method reckons with the 
statistical dependency due to recurrent observations of the same individual. The 
measure of Unidirectionality of attack corresponds to the sign-changed x^-value. 

The clustering together of entities of the same type is measured as a Xn^-correlation 
between a matrix of mean distance among entities and a ‘hypothesis’-matrix. The 
liypothesls’-matrix reflects Type-segregation because cells belonging to entities of 
the same type are filled with the number 1 and cells of different types are filled with 
zeros. Segregation is thus reflected by a positive correlation. 

The degree with which dominants occupy the centre is measured by the spatial 
directions of others around ego. Using circular statistics (27) the centrality of each 
individual is calculated for each scan by drawing a unit circle around it and projecting 
the direction of other group members (as seen by ego) as points on the circumference 
of this circle. Connecting these points with the origin produces vectors. The length of 
the mean vector represents the degree in which the position of group members relative 
to ego is clumped; longer mean vectors reflect more clustering in one direction and 
indicate lower centrality (i.e. lower ‘encirclement’). Thus, greater centrality of higher 
ranking entities is reflected in a stronger positive correlation between rank and 
encirclement. 

To exclude a possible bias brought about by transient values, the correlations for 
centrality of dominants, for unidirectionality and between social behaviour and rank 
of the partner, are calculated on data collected after time-step 200. 



3. Results 



3.1 Cohesiveness of groups. 

Cohesiveness of grouping decreases at lower values of SearchAngle, because 
entities return to others later. This is in line with former findings based on a discrete, 
lattice-based world (16,17). Different degrees of cohesiveness will be referred to as 
high (SearchAngle=180°), medium (SearchAngle=90°) and low cohesiveness or loose 
groupings (SearchAngle=45°). Looser grouping diminishes the frequency of 
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interaction. StepDom affects mean distance much less than SearchAngle and only 
affects distance for the most cohesive grouping (i.e. via route 1, described below). 



A 



Cohesiveness 

Medium Loose 



Time Steps 




Fig. 2. Means and standard errors are calculated over 5 runs. H = high, M = medium, L = low. 
A. Standard error above and below the mean U-statistic (measuring dominance of Weak- over 
StrongType) for different degrees of cohesiveness and StepDom-values. B. Mean Rank- 
differentiation measured by the coefficient of variation of Dom-values for different conditions. 
Lines of increasing boldness correspond to increasing StepDom-values. C. Differentiation of 
Dom-values for Strong and WeakType entities with StepDom = 1. Typical case as observed in 
one run of highly cohesive and loosely grouping VirtualSpecies. Grey: WeakTypes, black: 
StrongTypes. Bottom left: Standard error above and below the mean of D. centrality measure 
between Dom-value and Encirclement and E. the measure for unidirectionality. 



3.2 Rank-overlap and other social-spatial consequences. 

In line with expectancies, WeakTypes dominate more over StrongTypes at higher 
levels of cohesiveness (Figure 2). This holds at all intensities of attack. Furthermore, 
the degree of rank-overlap between both Types also increases with higher intensity of 
attack (Figure 2). This can be explained, because both high intensity and cohesiveness 
cause larger rank-differentiation (Figure 2B). Thus, ranks among both Types, Strong 
and Weak, diverge more, so that their hierarchies of both types overlap faster (Figure 
2C). 

The stronger hierarchical differentiation in medium and highly cohesive groups 
arises from a complex interplay with spatial structure. This interplay can be described 
as if following three sub-routes (Figure 3). 

First, while the hierarchy differentiates in the course of time the entities space out. 
This is due to the stronger role diversification that accompanies rank differentiation: 
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some entities become permanent losers and by fleeing again and again from others 
they move further and further away from their group members. In this way the group 
spaces out. Spacing out causes the frequency of aggression to decline. This naturally 
implies a smaller number of opportunities for rank-reversal, and thus a more stable 
hierarchy. The stability of the hierarchy, in turn, maintains its differentiation. A more 
differentiated hierarchy is more stable (Figure 3, route I). 

Second, differentiation of the hierarchy also increases unidirectionality of attack 
and this enhances its stability, which in turn supports its differentiation (Figure 3, 
route II). 

Third, strong hierarchical differentiation and stability cause, and are maintained by, 
a spatial structure with dominants in the centre and subordinates at the periphery. 
Spatial centrality of dominants emerges because entities of similar rank have equal 
chances of being defeated by or defeating each other. Since they are also treated in a 
similar way by other group-members, they will remain on approximately the same 
location. Lower-ranking entities are chased away by more group-members and 
therefore end up at the periphery. In turn, spatial structure also maintains the stability 
and differentiation of the hierarchy, because entities mainly interact with partners that 
are close in rank. It follows that if a rank-reversal occurs, it will not be a dramatic one 
(Figure 3, route HI). 




Fig. 3. Summary of interrelations of different variables. Numbers indicate processes leading to 
increased rank differentiation described in the text. 

Although these three routes are interconnected, significant correlations for these 
effects are mainly found at medium and high values for cohesiveness and intensity of 
attack (see stronger average spatial centrality (Figure 2D) and average 
unidirectionality (Figure 2E) found at these conditions). 

Cohesive groups differ from loose groups in two respects, the average distance and 
total frequency of attack. To keep the total frequency of attack under control we run a 
loose group proportionally longer than a highly cohesive group (i.e. ten times). The 
hierarchical differentiation, spatial centrality and unidirectionality of attack appear 
weaker, after the same frequency of interactions, than in a strongly cohesive group 
(Figure 4) and the three routes of social-spatial structuring (based on correlations over 
periods lengthened proportionally to contain the same mean frequency of attack as in 
cohesive groups) are non-significant. Yet, unexpectedly, dominance of WeakTypes 
over Strong ones is similar after the same frequency of attack in highly cohesive 
groups! This may be due to lower sexual segregation than in cohesive groups (Figure 
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4), which may lead to relatively higher frequencies of interactions between the sexs. 
As a consequence, StrongTypes may be ‘pulling up’ WeakTypes and WeakTypes 
‘pulling down’ Dom- values of StrongTypes, dissolving the rank-differences between 
them. These details will be tested in a subsequent study. 





Fig. 4. Comparison of social spatial measures (mean plus/minus S. E.) between highly cohesive 
and loose groups controlled for the total frequency of attack. 



4. Discussion 



4.1 The model 

Effects of cohesiveness are similar to those of intensity of aggression (for group 
size 8, see 14,18): both result in a steeper hierarchy and consequently more rank- 
overlap develops between types. 

In more cohesive groups the initial two rank classes to which StrongTypes and 
WeakTypes belong, clearly dissolve faster due to more frequent encounters. This 
effect was also described by Hogeweg and co-authors (22,28) in a model on 
‘BumbleBees’. The Bumble colony consisted of a Queen with a high initial rank and 
many Workers with a low initial Dom-value. Dissolution of the rank differences was 
correlated with speed of nest growth. In a slow-growing colony, workers had much 
time to interact with the Queen and consequently, relatively many workers transfer to 
the ‘elite’ rank category of the Queen. In a fast-growing colony, workers are more 
occupied with rearing young and so are left with fewer opportunities for interaction. 
Due to the rareness of interactions, the hierarchy among workers remains weak and 
labile, so that only few Workers transfer to the ‘elite’ category. Thus, ranks of 
Workers and the Queen are kept apart more distinctly. 
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4.2 Biological Implications 

Increased cohesiveness not only obscures rank-categories belonging to different 
types, but also enhances spatial centrality of dominants. In the model, such spatial 
structure emerges although entities do not prefer a central, or some other, spatial 
location. Therefore, the model presents us with a parsimonious alternative to the 
selfish herd theory (29) in which it is assumed that individuals evolved a ‘centripetal 
instinct’, because it is safer to be surrounded by others as a protection against possibly 
approaching predators. 

In confirmation of the model, a preferenee for spatial centrality has so far not been 
discovered in real animals, not even in the very elegant fish experiments carried out 
by Krause (30). What is clear, however, is that upon perceiving signals of a predator, 
fish tighten their shoals. In a shoal consisting of small and large fishes, increased 
cohesiveness leads to size assortment, with large fishes in the centre and small ones at 
the periphery (31). Again this is in line with our model: Because large fishes were 
dominant over small ones, size-assortative shoaling is exactly what is expected to 
occur, and there is no need for an assumption of a certain preference for any location. 

This model represents only the barest essentials of group-living and dominance 
interactions. It bears no resemblance to highly intelligent and complex apes. Yet, the 
essential characteristics of this model may hold for any group-living species that 
performs dominance interactions. Supposedly, it is a general rule that cohesiveness 
entails more rank-overlap among types. Thus, these results confirm the hypothesis 
that initially inspired this study: the fact that female dominance is relatively stronger 
in pygmy chimpanzees than in common ones may, at least partly, be a due to their 
stronger cohesiveness of grouping. Thus it is unnecessary to invoke additional 
differences among species in co-operative tendencies among females against males. 
However, the model does not preclude that coalitions among females contribute to 
female dominance. However, even if future study confirms a relatively higher 
frequency of female coalitions among pygmy chimpanzees than common ones, even 
then it is not clear whether these coalitions are caused by or the result of female 
dominance: after all, higher female rank, arising from stronger cohesiveness, may 
allow females to co-operate more intensely against males than in those cases in which 
females rank lower. 

Cohesiveness of groupings varies not only between species (e.g. see, 32), but also 
within species according to environmental conditions. Chimpanzees, for instance, 
group more cohesively in less seasonal areas (33). It would therefore be interesting to 
compare the degree of rank-overlap between both sexes for different environments. 

Thus, these individual-oriented models produce new hypotheses for studying 
social-ecological processes and their consequences and direct our research into a more 
situated context-based, realistic approach. 
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Abstract. Parents raising multiple offspring must decide how to di- 
vide resources between them. Much empirical data on the parenting be- 
haviour of particular species has been collected. Birds, in particular, have 
been shown to follow a number of provisioning rules. However, the adap- 
tive significance of this variation in decision-making strategies has been 
largely unexplored. Here we present a simulation model of the western 
bluebird, Sialia mexicana, with which we explore the utility of various 
simple feeding heuristics. The simulated parents face the task of simul- 
taneously raising several offspring who are of differing ages and thus 
have differing resource needs. We show that the success of simple rules 
of thumb varies with environmental parameters in a manner which (i) 
predicts experimental results in the biology literature, and (ii) can be 
explained using a notion of parental egalitarianism. 



A repeated finding within the field of artificial life has been that complex 
adaptive behaviour may arise from the interaction between simple systems and 
their environment. This perspective contrasts starkly with received wisdom within 
the decision-making literature where simplicity is often automatically equated 
with stupidity. The reason for this prejudice is easy to see. The normative the- 
ories which currently dominate research into decision making (probabilistic rea- 
soning, Bayesian reasoning, various propositional logics, etc.) often involve the 
complex integration of many pieces of information and the construction of elab- 
orate models. 

This complexity is geared towards achieving decision-making’s gold stan- 
dard; rational coherence. The proponents of these complex normative models 
claim that if one follows their prescriptions, and the assumptions that these 
models require in order to work properly are met, one is sure never to behave 
intransitively, never to hold inconsistent beliefs, and never to fall foul of rea- 
soning fallacies or other delusions. In contrast, through his simplicity, the naive 
roulette-wheel gambler who bets his last dime on red thirteen because it has not 
come up for the entire time he has been sitting at the table, or simply because 
it is his lucky number, is revealing himself to be irrational and thus exploitable. 

The appeal of rationality is very strong, and the failure of human and ani- 
mal decision-making to live up to its high standards has been the cause of some 
consternation amongst psychologists and economists [1]. However, two recent 
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research movements have challenged the rationality-based hegemony and sug- 
gested that successful, adaptive behaviour in the real world may not require the 
complex machinery advocated by the orthodoxy, but may rely on simple heuris- 
tics which although limited in their ability, are suited to the environments in 
which they must operate. 

The first of these research programmes has become associated with the 
behaviour-based robotics field as it has tried to extricate itself from good old- 
fashioned artificial intelligence (gofai) [2]. The second is exemplified by decision- 
making researchers who are beginning to question the benefit of rational coher- 
ence if it can only be obtained at the expense of tractability and psychological 
reality [3]. 

The animal behaviour literature has proved infiuential in both of these cases. 
Roboticists have discovered that organisms have achieved robust, simple so- 
lutions to many of the problems they face as engineers, and that these natural 
solutions reveal that the elaborate planning and modelling which GOFAI assumes 
goes on inside the heads of even the simplest intelligent agents is to a surpris- 
ing extent unnecessary [4]. Similarly, decision-making theorists have begun to 
appreciate that ethologists have had considerable success explaining animal be- 
haviour from a perspective which appreciates that evolution will favour accurate 
and reliable reasoning, but accepts that this reasoning must be carried out in 
the real world by real mechanisms [5]. 

Here we approach an ecologically relevant decision-making task which is typ- 
ically regarded as intractable from the perspective of normative frameworks and 
demonstrate that simple ecologically sound rules of thumb can achieve high lev- 
els of performance and that their performance predicts empirical data collected 
by animal behaviour researchers. 

1 Parental Investment 

A decision-making problem which besets all animals which care for multiple si- 
multaneous young is how to divide resources amongst these offspring. A classical 
approach to this problem would be to treat it as a game against nature in which 
the parent must play a strategy which maximises return on investment over the 
period of care. However, since the moves in such a game are many and the op- 
tions available at each move are plural, the game tree which must be analysed 
in order that such an approach lead to a solution is prohibitively large. 

For example, Becker [6] provides an economic analysis of how rational hu- 
man parents should distribute investment among their children, assuming that 
parents are trying to maximise aggregate child quality as defined by the sum 
of all the children’s wealth as adults. This quality is a function of the resources 
invested in the child, the child’s own skill and abilities, and any extra income he 
or she might earn through sheer luck. Becker assumes that there are diminish- 
ing returns on parental investment, and shows that as long as these diminishing 
returns are the same for all children, parents should distribute investment such 
that each child achieves the same degree of wealth. However, if some children 
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are capable of accumulating more wealth per unit of parental investment than 
others, then parents should of course favour them. 

While these conclusions may sound reasonable in general, they are of limited 
use in making predictions about actual parental behaviour in specific situations. 
This is because they assume that parents have some means of calculating the 
effects of each unit of investment on the future payoff they expect to gain from a 
child. In practice, however, this calculation can require involved manipulations 
of information that is itself difficult to obtain. Children do not come equipped 
with investment meters for their parents’ convenience. 

In response to problems of this kind, biological models of parental investment 
have only dealt either with single offspring at a time, multiple simultaneous off- 
spring assumed to be identical, or have been limited to cases where parents are 
assumed to base individual investment decisions on fully informative offspring 
solicitation signals [7-9]. In addition, previous models have treated parental in- 
vestment as a series of events that have independent consequences for offspring 
fitness [10]. 

These simplified models of the parental investment problem have identified 
two simple rules which animals can employ in order to successfully raise as 
many offspring to reproductive age as possible. When each offspring requires the 
same amount of investment in order to reach a given level of fitness, then one 
parental solution would be to treat each on the basis of its need. In birds, for 
example, chicks often beg for food when hungry. If intensity of begging is an 
honest and accurate signal of need, parents would then be expected to feed their 
chicks according to this intensity in order to achieve investment equality. This 
is clearly a very simple decision rule. 

Alternatively, in the case that offspring are not born simultaneously (chicks, 
for example, often hatch at different times), behavioural ecologists have identified 
a second simple but adaptive parental decision rule: satisfy the oldest offspring 
first. Since one major predictor of the probability of survival to adulthood is the 
current age of offspring (the older they are, the closer they are to independence 
and reproduction, and the more likely they are to make it all the way there), par- 
ents are expected to benefit from preferentially investing in their older offspring 

[7]. 

However, birds display a variety of parental feeding patterns. Coots preferen- 
tially feed the smallest chicks[llj. Pigeons preferentially feed the hungriest [12]. 
Common swifts preferentially feed their largest/oldest^ chicks [13]. Fieldfares ap- 
pear to feed randomly [14]. Despite the amount of published data on this topic, 
there has been no proposal for why such a variety of strategies should exist. 

2 Simulating Parental Investment 

Here we model the problem of parental investment as a series of interdependent 
food-allocation decisions, in which investment is meted out in many small indi- 

^ Feeding the largest of a brood of asynchronously hatched chicks is equivalent to 
feeding the oldest since it ensures that the oldest chick will remain the largest. 
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visible doses. Since offspring axe of different ages they will have different resource 
demands. A successful parental investment strategy must balance these demands 
over the care-giving period in an effective manner despite the stochasticity of the 
foraging environment. The model we present is an iterative computer simulation 
that mimics the feeding, digestion, metabolism, and growth of asynchronously 
hatched western bluebird {Sialia mexicana) chicks from hatching until fledging. 
This model allows us to explore the effects of important aspects of parental de- 
cision making which are difficult to capture with more formal approaches. More 
specifically, it allows us to explore the complex relationship between a proximal 
decision-rule and the ultimate effects of this rule. 

We compared the performance of the various parental strategies reported in 
the parental investment literature and some which were not (i.e., preferentially 
feeding the youngest or feeding in a fixed random order), in terms of number 
and weight of chicks fledged, across a range of different artificial environments. 
In doing so we assume that begging is an honest signal of need [9] and hence that 
preferentially feeding the chick begging with the greatest intensity is equivalent 
to preferentially feeding the hungriest chick.^ 

We chose to model western bluebirds because there is information available 
about both metabolic rates and growth rates for this species. Equations for 
metabolic rate, digestion rate, and stomach size of our simulated chicks were 
derived from published field data on growth and metabolic rates of bluebird 
chicks across the nestling period [15]. Values for the calorie content of the insects 
that parents feed to their chicks, and the proportion of metabolisable energy 
these items contain were taken from Dykstra & Karasov [16]. 

To mimic environmental differences in food availability, we varied the fre- 
quency with which food was found and returned to the nest, using a parameter 
p governing the probability of repeatedly finding food and q governing the prob- 
ability of repeatedly failing to find food. The decision rules we tested determined 
which chick was fed on any given occasion, and it is the sum of these feeding 
decisions over the course of the nestling period that determines overall parental 
success in terms of the total size of all fledged chicks. 

A total of four simulated chicks hatched at one day intervals, were fed for 
20 days, then fledged. Each 24-hour day was divided into 10-minute intervals of 
simulated time, during which: 1. Any egg due to hatch, hatched. 2. A bug (food) 
was found by the foraging parent with probability determined by environmental 
parameters p and q. 3. If a bug was found, the parent’s decision strategy was 
used to choose a chick to feed. 4. If the chosen chick had enough space in its 
stomach for the bug, it was fed, otherwise the next most preferred chick was 
chosen. 5. If a chick was fed, the food was added to the chick’s stomach. 6. Each 
chick with food in its stomach gained calories by digesting 10 minutes’ worth 
of that food. 7. Every chick burned 10 minutes’ worth of calories in accordance 

^ In reality, rather than always honestly revealing their true level of need, chick beg- 
ging strategies will have coevolved with parental provisioning strategies. Although 
modelling this coevolutionary process is beyond the scope of the present paper, it 
will be pursued in further work. 
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Fig. 1. The mean nest weight in grams achieved by each of four simple strategies (Ran- 
dom, Largest, Smallest, Hungriest, respectively) across eight classes of environment. 
Each strategy was simulated 500 times in each of 100 environments {p, q} with pa- 
rameters drawn from the set {.0, ,1, ..., .9}. These environments are grouped by food 
availability, defined as the mean percentage of successful foraging attempts. Small dif- 
ferences in strategy successes can be very important ones. A half gram difference in the 
weight of a fledged chick is a significant one. The results depicted remain qualitatively 
the same for clutches of 3 or 5 chicks. 



with its metabolic rate, gained or lost weight according to the net calorie change, 
and if it grew, its stomach capacity increased to accord with its new size. 8. If 
a chick’s weight dropped below a certain age-specific limit it died. Foraging and 
feeding took place for 14 hours out of every day. As with real bluebirds, during 
the night no food was gathered or distributed, but, chicks continued to undergo 
steps 6 through 8. 

3 Results 

Not surprisingly, the number of chicks that successfully fledged increased with in- 
creasing food availability. More interesting was the finding that the mean amount 
of food available to parents had a strong effect on the success of the different 
feeding rules (Fig. 1). Each of the four decision rules identified in the parental 
investment literature on birds was successful for some range of environmental 
“richness” . Furthermore, the manner in which the success of these strategies var- 
ied with the abundance of food in the foraging environment is explicable from a 
perspective of “parental egalitarianism” . 

For relatively poor environments in which food was found 30% of the time 
or less, preferentially feeding the largest /oldest chick was the most successful 
decision rule. This rule, which picks out a single chick and targets it for pref- 
erential investment, does particularly well in environments where only a single 
chick can be raised. In this sense it is non-egalitarian, rejecting impartiality in 
favour of inequitable prejudice. In harsh environments, this bias is adaptive and 
successful. 

For environments in which between 30% and 70% of parental foraging trips 
were successful, preferentially feeding the smallest chicks outperformed all other 
strategies. This strategy is more egalitarian than feeding the largest/oldest be- 
cause the smallest chick is not fixed, but may change over time dependent on 
who gets fed. If whichever chick is currently smallest is preferentially fed, no 
smallest chick is likely to remain the smallest for long. In this way parents may 
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achieve a limited form of equality amongst the chicks without any complicated 
calculation. Parents using this strategy are letting their environment (in this 
case the metabolism of their young) do some of the decision-making for them. 

For richer environments, in which food is found on between 70% and 90% of 
foraging trips, the most successful parents fed on the basis of short term need, i.e., 
they fed the hungriest chicks, as revealed by intensity of begging. Preferentially 
feeding the hungriest chicks is an even more egalitarian strategy than favouring 
the smallest. Since the degree of hunger changes rapidly within a nest, if parents 
have access to some accurate indicator of this information, and the environment 
in which they forage is rich enough to provide for an entire brood, it makes sense 
to use this information in order to evenly balance the distribution of resources 
within the nest. 

Finally, when food was even more abundant, all decision rules performed 
equally well. In such cases we would expect random feeding to evolve since this 
may make less demands on parental decision-making mechanisms. 

Preferentially feeding the youngest chicks and feeding chicks in a fixed ran- 
dom order were never the most successful strategies. Thus our simulation agrees 
with empirical findings insofar as only the empirically observed strategies are 
the most successful in any simulated environment. But to what extent do our 
results concerning the manner in which the success of these strategies varies with 
environmental richness also agree with experimental and observational data? 

The few published reports of species switching provisioning rules as a result of 
changing environmental conditions are surprisingly consistent with the findings 
of the simulation model. For instance, pied flycatcher females preferentially feed 
the smallest chicks under normal food conditions, but when food availability is 
experimentally reduced they preferentially feed the largest [17]. When food is 
plentiful, sparrow hawk mothers allocate food resources equally among chicks, 
when it is scarce they switch to a strategy that favours the largest [18]. 

4 Combining Cues 

All of the simple strategies explored above make decisions based only on a single 
cue (weight, hunger, or age). As field studies suggest that parents sometimes 
combine cues [19], we also tested two sets of multi-cue feeding rules. Both sets 
fed chicks in an order determined by a criterion constructed as a linear weighted 
sum of the three chick cues (weight, hunger, age). The first set of rank-based 
multi-cue strategies calculated this criterion on the basis of a chick’s rank-order 
within the nest. For every feeding decision made by a parent bird, all chicks 
were ranked according to each cue, and a chick’s ranks were summed after being 
weighted according to the parental strategy. The parent then preferentially fed 
chicks scoring highest on this combined cue criterion. 

The second set of real-valued multi-cue strategies weighted and summed the 
real cue values (e.g., weight in grams) rather than merely the chicks’ ranks. We 
assessed both sets of strategies bearing in mind that the real values employed by 
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Fig. 2. The mean nest weight (in grams) achieved by the best of the 2197 possible 
mixed-cue strategies with cue weights drawn from the set {—6, —5, ...5, 6}. Each strat- 
egy was simulated 100 times in each of eight selected environments differing in mean 
food availability, but not variance. The performance of the best simple decision rule 
and of random feeding are shown for comparison. Columns show Random, Best Simple, 
Best Linear Weighted Ranks, and Best LineM Weighted Reals, respectively. 



the second set of strategies might not be realistically assessable by parent birds, 
whereas ascertaining rank-orders is more likely to be within their ability. 

Not surprisingly, for each of the environments tested, there was some subset 
of both the rank-based and real-valued mixed-cue rules that outperformed the 
single-cue rules (Fig. 2). Whilst the best real- valued strategies also outperformed 
the most successful of their rank-based cousins, they did not differ qualitatively 
in that they employed roughly the same weights. Furthermore, the degree of 
egalitarianism exhibited by the most successful mixed-cue rules increased with 
environmental quality just as it had done for the single-cue rules. 

The strategies employed by the best multi-cue decision-makers can, for the 
most part, be regarded as refinements of those employed by the best one-cue 
decision-makers. In poor environments strategies weighted Age positively. In 
rich environments Hunger was weighted positively. Intermediate environments 
favoured strategies which either favoured small chicks, or achieved an intermedi- 
ate degree of egalitarianism through balancing opposing cues (e.g., feeding young 
and heavy chicks). The ability of multi-cue strategies to balance opposing cues 
in this way can be used to explain the increased performance of these strategies. 



5 Optimal Investment? 

So far we have used the performance of random feeding behaviour as a lower 
bound against which to compare the utility of the parental investment strategies 
we have assessed. However, without knowing the maximum possible success that 
is attainable in each environment, it is difficult to assess how well these decision 
rules perform in an absolute sense. As was noted during the discussion of parental 
investment modelling, in practice finding this upper bound on performance is 
intractable. We have attempted to approximate the results of optimal investment 
by assessing feeding rules that integrate as much relevant information into their 
decisions as is computationally tractable, whether or not such strategies would 
be realistically achievable by parent birds. 
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We implemented three short-term optimisation strategies and assessed their 
performance across the same range of environments used in the previous sim- 
ulations. Each optimisation strategy relied on knowledge of the equations un- 
derlying chick metabolism and growth, the character of the stochasticity in the 
environment, and the exact state of each chick in the nest. These strategies 
then utilised this knowledge to make reasonable guesses about the future and 
thereby construct a limited portion of the full decision tree which characterised 
the investment problem that they faced. 

Under the first maximising strategy (Next Bug), the current bug is offered 
to the chick whose eating of it would maximise the total weight of all chicks in 
the nest at the time the next bug is expected to be found. The second strategy 
(Two Bugs) is identical to the first except that it maximises nest weight at the 
time the second subsequent bug is expected to be found. The third strategy (1- 
10 Bugs) maximizes the short-term expected value of nest weight - that is, the 
sum, over the next 10 time-steps, of the probability of finding the next food item 
multiplied by predicted total nest weight at that time-step. This third strategy 
thus copes with variance in the interval between finding bugs. Surprisingly, not 
only did all three strategies perform worse than the multi-cue and single-cue 
rules, but by and large they performed worse than feeding chicks at random. 
In addition, the most sophisticated optimization strategy out-performed its less 
complex relatives in only one environment. 

These three strategies are far more complex than the successful simple de- 
cision rules. They require knowledge that actual parents are unlikely to possess 
and could not directly assess. They integrate this information in an attempt 
to determine the best possible decision to make, and yet, despite all that, they 
make terrible decisions. Why are the simple strategies so much more successful 
than their complicated, computationally expensive competitors? 

A general answer to this kind of problem is that the nature of the provisioning 
problem faced by many parents involves long-term dependencies which ensure 
that actions which are successful in the short term may have catastrophic impact 
in the longer run. In the current context, one form of long-term dependency is 
the difference between day and night. Even the most far-sighted of the short- 
term optimisation rules that we tested could only base its behaviour on what 
it expected to happen in the next 100 minutes. For most of every day such a 
strategist plans and acts in blissful ignorance of the coming dusk. Whilst the 
short-term future anticipated by such a strategist may look bright, the fact that 
no foraging can be undertaken during the 10 hours of darkness ensures that any 
optimism may be misguided. 

The hypothesis that the poor performance of these short-term optimisers is 
due entirely to the unforeseen effect of night falling was tested by shortening the 
simulation’s hours of darkness from 10 to zero. As predicted, in this “land of 
the midnight sun” the performance of the optimising strategies increased, but 
only to around that of feeding in random order (Fig. 3). In no environment did 
an optimising strategy outperform the best simple rule. Clearly the parental 
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Fig. 3. Columns show the mean nest weight achieved by Random, Next Bug, Two 
Bugs, 1-10 Bugs, and Best Simple, respectively over 500 runs in eight environments in 
which each day comprised 14 hours of daylight and an instantaneous period of night. 



investment problem features residual long-term dependencies which continue to 
defeat short-term optimisation. 

As these results suggest, the problem is wider than dealing with the cycle 
of night and day. Many of the decision-making problems faced by humans and 
other animals exhibit similar kinds of long-term dependencies which invalidate 
short-term optimisation approaches. The stock market, for example, has proven 
to be extremely resistant to short-term optimisation approaches largely because 
of the long-term dependencies and stochasticity which characterise it. However, 
this does not stop simple rules of thumb such as “only invest in well-known 
companies” from making money [3]. 

The failure of short-term optimisation in these contexts might therefore be 
considered to be symptomatic of a general failure of normative models to apply 
unproblematically to realistic decision-making problems [20]. Whilst simplified 
models of these problems may be tractable, adding even the degree of realism 
modelled here renders them inapplicable in their full-blown form, and hard to 
approximate using attenuated short-term versions. In this sense, the success of 
the simple rules which were explored in this study cannot be accounted for by 
claiming that they approximate some optimal solution to the problem they face, 
since no such optimal solution, nor any successful approximation to it, can be 
presented. 

6 Conclusion 

Unlike Becker’s unboundedly rational parents, discussed earlier, our results sug- 
gest that parents do not have to carry cumbersome investment equations in their 
heads. The unsophisticated feeding decision rules we present here were successful 
despite their irrational simplicity. Combined with an understanding of the role 
of parental egalitarianism, these rules allow parental investment decisions to be 
robust with respect to changes in environmental quality. Indeed, within our sim- 
ulation, it is only through the use of a small set of environmentally contingent 
decision rules that parents can be most successful. 

Although empirical studies demonstrate a diversity of parental feeding rules, 
to date there has been very little information on the relation between these rules 
and the richness of the environments in which they operate. Our findings suggest 
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that the notion of parental egalitarianism coupled with an understanding of the 
simple rules which parents may use to vary this egalitarianism can be used to 
better understand this body of empirical data. 

The results of this research suggest not only that the state of the environment 
strongly affects the success of parental investment strategies, but also that even 
under these complex conditions, strategies do not have to be complex to be 
successful. 

Acknowledgements 

This paper benefited from the comments of Jason Noble, Henrietta Wilson, two 
anonymous referees, and the programming skills of Martin Dieringer. 



References 

1. Kahneman, D., Slovic, P., Tversky, A.: Judgement under uncertainty: Heuristics 
and biases. CUP, New York (1982) 

2. Brooks, R.A.: New approaches to robotics. Science 253 (1991) 1227-1232 

3. Gigerenzer, G., Todd, P.M., the ABC Group: Simple heuristics that make us smart. 
OUP, New York (1999) 

4. Webb, B.: A robot cricket. Science 275 (1996) 62-67 

5. Green, R.F.: Stopping rules for optimal foragers. Am. Nat. 123 (1984) 30-43 

6. Becker, G.S.: A treatise on the family. HUP, Cambridge, MA (1991) 

7. Lack, D.: The natural regulation of animal numbers. OUP, Oxford (1954) 

8. Parker, G.A.: Models of parent-offspring conflict. V. Effects of the behaviour of 
the two parents. Anim. Behav. 33 (1985) 519-533 

9. Godfray, H.C.J.: Signaling of need by offspring to their parents. Nature 352 (1991) 
328-330 

10. Winkler, D.W.: A general model for parental care. Am. Nat. 130 (1987) 526-534 

11. Horsfall, J.A.: Brood reduction and brood division in coots. Anim. Behav. 32 
(1984) 216-225 

12. Mondloch, C.J.: Chick hunger and begging affect parental allocation of feeding in 
pigeons. Anim. Behav. 49 (1995) 601-613 

13. Martins, T.L.F., Wright, J.: Brood reduction in response to manipulated brood 
sizes in the common swift. Behav. Ecol. and Soc. 32 (1993) 61-70 

14. Ryden, O., Bengtsson, H.: Differential begging and locomotory behavior by early 
and late hatched nestlings affecting the distribution of food in asynchronously 
hatched broods of altricial birds. Zeitschrift fiir Tierpsychologie 53 (1980) 291- 
303 

15. Mock, P.J., Khubesrian, M., Larcheveque, D.M.: Energetics of growth and matu- 
ration in sympatric passerines that fledge at different ages. Auk 108 (1991) 34-41 

16. Dykstra, C.R., Karasov, W.H.: Daily energy expenditure by nestling house wrens. 
Condor 95 (1993) 1028-1030 

17. Gottlander, K.: Parental feeding behavior and sibling competition in the pied 
flycatcher, Ficedula hypoleuca. Ornis Scandanavica 18 (1997) 269-276 

18. Newton, I.: Feeding and development of the sparrowhawk, Accipiter nisus, 

nestlings. J. Zool. (London) 184 (1978) 465-487 

19. Kolliker, M., Richner, H., Werner, I., Heeb, P.: Begging signals and biparental 
care: nestling choice between parental feeding locations. Anim. Behav. 53 (1998) 
215-222 

20. Simon, H.A.: Invariants of human behavior. Ann. Rev. Psych. 41 (1990) 1-19 




Imitation and Cooperation 
in 

Coupled Dynamical Recognizers 



Takashi Ikegami 1) 
and 

Makoto Taiji 2) 

1) Institute of Physics, The Graduate School of Arts and Sciences, 
University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153, Japan 
ikegflsacral . c .u-tokyo .ac.jp 
2) Institute of Statistical Mathematics, 

4-6-7 Minami-Azabu, Minato-ku, Tokyo 106 
tai jiaism. ac . jp 



Abstract. A coupled dynamical recognizer is proposed as a model for 
simulating intelligent game players, who can imitate the other player’s 
behavior. A kind of recurrent neural network called a dynamical rec- 
ognizer is used as an internal model of the other player to imitate the 
behavior. The Rashevskyan game is examined, where each player moves 
along a separate spatial axis to take an advantageous position over the 
other player. Though the players are egocentric in principle, it is shown 
that some altruistic behavior will be performed as a dynamical attrac- 
tor phase. The altruistic behavior is no longer attainable by continually 
modeling the opponent player merely as a Tit for Tat player. Rather, 
players have to dynamically change their model of imitation to achieve 
mutual co-operation, otherwise they go to a static non-cooperative Nash 
solution. Enhancement of a minute difference in players’ action patterns, 
called the pragmatic paradox, is the key issue throughout this paper. 



1 Introduction 

The play required by classical game theory generally requires a level of rational- 
ity much higher than natural rationality. For example, Anderlin (1990) showed 
that a Nash equilibrium is not guaranteed even with universal Turing machine 
players. To overcome such over-rationality, a kind of bounded rationality has 
been proposed and studied by Rubinstein (1986) and others (see e.g. Binmore 
1987) using finite automata models. More recently, epistemic logic has been de- 
veloped to study what beliefs and inference abilities are required for rational 
players (Bacharach 1997). 

The above studies are more concerned with playing matrix games using static 
solutions. On the other hand, another type of a game theory is concerned more 
with the dynamic aspects of game interactions. It was first introduced by Ra- 
shevsky (1947) and Rapoport (1947), and recently it has been further developed 
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by Rbssler (1994), Akiyama (1998) and the authors of the present paper (1998, 
1999). Those approaches can be called the Rashevskian or dynamical systems 
game. In the following, this new approach to game theory is introduced, and a 
new understanding of dynamic co-operative behavior is proposed. This approach 
is developed into a possible theory of developmental psychology and language. 
The paper concludes with a discussion of this on-going work. 

2 Coupled Dynamical Recognizers 

A kind of neural network with recurrent interactions has been used to imitate 
human language performance (Elman 1990), to mimic finite automaton behav- 
ior (Pollack 1991), and to manipulate robot navigation (Tani 1995). Following 
Pollack, we call the network a dynamical recognizer. 

Recently, we proposed a coupled dynamical recognizer to simulate two agents 
playing the prisoner’s dilemma game (Taiji and Ikegami 1999; Ikegami and Taiji 
1998). Each agent generates an internal model of the other agent that best 
imitates the other’s past behavior patterns. The model is expressed by dynamical 
recognizers. Based on the best internal model, players chose their next action by 
anticipating their future behavior and optimizing it. 

However, the prisoner’s dilemma (PD) game is a simple matrix game without 
any external environment. A new problem arises when a certain environmental 
factor enter the agents’ optimality functional (i.e. an individual’s satisfaction 
value ) in addition to the inputs from the other agent. This situation was first 
formalized independently by Rashevsky (1947) and Rapoport (1947). In this 
paper, we apply our coupled dynamical recognizer model to this problem. We 
now assume that each agent can move along one spatial axis in discrete time 
steps, so that the agents now have to consider both players’ action patterns and 
spatial information. Though those agents are egocentric players in principle, 
we will show that some altruistic behavior will be performed only as a result 
phenomenon. 



3 Descriptions of Games and Dynamics Equations 

3.1 Dubey’s spatial game 

A game we use was originally introduced by Dubey (1986). First, consider a 
game with two players, where each player has an optimal functional consisting 
of two parts; one given by the function of its own position x and one given by 
the function of other player’s position y. Given a home position at (a, b) for a 
player 1, the optimal functional (t/i) of this player is given by, 

Ui [x, y) = -[(x - af + {y- bf]. (1) 

The egocentric player 1 is most rewarded when x = a and y = b. Similarly, 
given other player’s optimal functional as U^ix^y) — — [(x — c)^ + {y ~ d)^], the 




547 



egocentric player 2 wants to have the player 1 at a; = c and y = d. li a ^ c and 
d, classical game theory indicates that there is a unique Nash equilibria pair 
of this game, i.e. {x = a,y = c), assuming that the strategy-set of each player is 
a point on each axis ^ 

This game is converted into the Rashevskyan game if l)each player is assumed 
to announce the numerical value of his or her position at that time step, 2) each 
player can memorize the other player’s action sequences, and 3) each player 
can compute the optimal functional at that position and decide to step forward 
to or backward from its home position at the next time step. We thus assume 
a minimal kind of intelligence for our players; that is, memory and inference 
capability. 



3.2 Dynamical recognizer as a model of the opponent 

Two ’’intelligent” players play the Dubey’s spatial game by imitating the other’s 
behavior using a dynamical recognizer. The dynamical recognizer we use here is 
a simple two-layered, 12 connections network which is decomposed into context 
and function networks. As Pollack (1991) first explicitly showed, a dynamical 
recognizer can imitate behaviors of some finite automaton. To see how well 
a dynamical recognizer learned the given automaton, we examine geometrical 
patterns in the “context space”, i.e., a plot of output neuron states against all 
possible input states. If a dynamical recognizer can successfully imitate the given 
finite automaton, the context space plotting shows finite islands of clusters. A 
clear correspondence is observed between each cluster and a node of the finite 
automaton. When the dynamical recognizer fails to imitate or the opponent is 
not a finite automaton, the context space plotting shows a stretched and folded, 
fractal-like structure. 

We use this dynamical recognizer to mimic the opponent’s behavior. In each 
game round, both players 1) generate a model of the opponent that best mimics 
the opponent’s past moves, 2) anticipate the opponent’s future moves and 3) 
chose as their next action that which is expected to induce the best future 
outcomes. These three steps are repeatedly applied in each game round. In the 
following, we explain those three steps in detail. 

1) Past behavior of the opponent will be mimicked by the prescribed dy- 
namical recognizers. We here use only two input neurons {yo and yi) and three 
output neurons {zo,zi and Z 2 )- One of the input neurons j/i is fixed in its state 
as a biasing network, ^ while two output neurons called recurrent outputs {zi 
and Z 2 ) are used recurrently to determine the function network. The dynamics 

^ As long as one player sticks to a point, there is no reason also for the other player 
to deviate from the point. It is then called a Nsh equilibruim point. Also if it is 
impossible for both players to take simultaneously advantage from a deviation from 
the point, it is called a Pareto efficient point. In this game, such Pareto efficient point 
forms a line segment joining (a,b) and (c,d).(Dubey 1986) 

^ When a player is on the left hand side of his or her home position, we set yi = 0.3. 
Otherwise we set yi = 0.7. This value is fixed through the simulation. 
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of context and function network is expressed by the following equations at the 
time step n. 



M 



Zi{n) = gC^Wijyjin)), 
j=0 


(2) 


N 

Wij = 'Y^UijkZkin - 1). 


(3) 



fc=i 



where Wij denotes the weight of the function network determined by the con- 
text network with its backbone context weights Uij/.. In the equations above, 
nonlinearity exists only in the sigmoid function g{x) = 1)“^ . We can 

control the nonlinearity by changing the parameter (3 (through this paper, we 
set /9 = 6.0). 

For the purposes of the present study, only one input and one output neuron 
are necessary since this game has only two actions: stepping forward and back- 
ward relative to the home position. One input neuron j/o(*^) is used for specifying 
the imitator’s action and one output neuron zo{n) is used for specifying the as- 
sociated action of the imitated player. The output is rounded off to 0 (forward) 
and 1 (backward). 

The degree to which the obtained dynamical recognizer can imitate the real 
opponent is measured by the weighted sum over past behavior patterns. In prac- 
tice, the error E{n) after the n-th game is computed by 

£(n) = f;A"-'’(zo(fc)-d(fc))2, (4) 

A=1 

where d[k) is the actual opponent’s action in the fc-th game, zo{k) is the action 
predicted by the network, and A is a parameter which controls forgetfulness. For 
most simulations, A = 0.95 was used. 

In the present paper, we quantify the strength of the connections Uijk as 
either -1 or 1. Therefore we can exhaustively search all structures to arrive at 
those that best mimic the opponent’s behavior. Since the context networks do not 
have continuous values, the supposed language class is severely limited. However, 
we find that not only finite automata but also many other non-finite automata 
are expressed by those networks. In each game round, we carry out exhaustive 
search over all possible structures to update the model of the opponent. 

2) Using the model of the opponent, each player anticipates any future ac- 
tions. In practice, by giving all possible combinations of actions against the 
model over the next T rounds (i.e. x{t + l),x{t + 2) ■ ■ ■ x{t + T)), we compute the 
expected reward averaged over trajectories from the current position {x{t),y{t)). 

T 

< Ui{x,y) >= "^Uiixit + n),y{t + , 

n=l 



( 5 ) 
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where the possible combinations are denoted by x and the expected opponent’s 
actions associated with x are denoted by y. The future weight w is set at 0.7 
here. The outputs y{t) are computed for each time step n by the model obtained 
in the time step t. In other word, drawing the future spatial trajectories from 
the current position, players anticipate the possible reward associated with each 
trajectory up to T time steps ahead. The time step T is set at 10 here. 

3) The players choose the action combinations that is expected to bring the 
highest functional value to itself, with the future weight ui = 0.7 and take the 
first element of that action pattern (i.e. ±A) as the next action. 

We have to introduce one important notion here, called model undecidability. 
In general if there are two or more networks that are equally likely to mimic 
the opponent’s behavior within the accuracy e from the best imitating model, 
we define them as equivalent models. The equivalent models have different net 
structures from each other, and we use e = 0.006 for the present simulation. 
Yet this may not cause any problem if all of the equivalent models propose 
to a player the same decision for the next action. If they happen to propose 
different ones, however, the game becomes undecidable. Instead of introducing a 
criterion of model selection, we branch a world line. That is, we follow the every 
possible trajectory of the game’s dynamics when such undecidability occurs. If 
the dynamics meets many such undecidabilities, many trajectories will branch 
out. Those trajectories are visualized as the spatial trails on a two-dimensional 
plane, and this will be made clear in the next section. 

4 Observation and Simulation 

We assume that each player has his or her own home position at (0. 1,0.9) for 
player 1 and (0.9,0. 1) for player 2. Starting from points close to the Nash solution 
(0.1, 0.1), player 1 moves along the x axis and player 2 moves along the y axis 
with a unit step size A — 0.001. A combined point (x,y) will be able to move 
on the two-dimensional plane with the minimal scale A. An initial set of action 
sequences are given to players either randomly or in specific patterns. For each 
game round, having one new bit of information concerning their opponents’ 
behavioral strategies, players can update their models. Model undecidability is 
allowed only for the first 100 time steps. Any branching trajectories from the 
first 100 time steps are investigated. Successive branching trajectories from the 
already branched trajectories are all taken into account. 

4.1 A bifurcation of trajectories 

First, we show (Fig. 1) an ensemble of all branching trajectories from the same 
initial condition on the two-dimensional plane (x,y). Staring from the Nash equi- 
librium point, most trajectories will soon come back to the Nash point if we do 
not allow any model undecidabilities. Players will remain around the Nash point. 
With a finite amount model undecidability e, the situation changes drastically. 
Some trajectories tend to deviate from the Nash point to make loops as in Fig.l, 
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Fig. 1. Trajectories of the game dynamics are displayed as spatial trails on the two- 
dimensional plane. All trajectories from the same initial states are overlaid on the same 
plane. Player 1 and 2 move along the x and y axes, respectively. The trajectories which 
go to the Pareto efficient points are the straight lines in this plane. 



most of which finally settle down to the Nash point. Yet some trajectories emerge, 
which leave from the vicinity of the Nash point to reach the Pareto efficient line 
(i.e. x-fy = 1). In this example, the time averages of the players’ positions come 
close to the fair Pareto efficient solution (0.5,0. 5). Fig.2 shows the two attractors 
of this system with respect to the time evolution of player I’s position. 

It is interesting to note that a precursory phenomenon is observed for the 
trajectories. At certain periods of time, one player behaves as 0101010 against 
other player’s 1010101. That is, both players move back and forth every time step 
alternatively in a succession of rounds. Then, both players simultaneously leave 
from their home positions and start to step forward. When they have crossed 
the Pareto efficient line, they start to go back to their home positions in a zigzag 
manner (i.e. 2 steps back, 1 step forward and so on). After a certain period of 
time, they again go back to the Pareto efficient line. All this behavior becomes 
an attractor of some branched trajectories. 

At times, players show similar oscillating behaviour around the diagonal line 
perpendicular to the Pareto efficient line (i.e., the y = x line). However, in this 
case, players always deviate from the cycle to fall back to the Nash equilibrium 
point. 
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Fig. 2. Overlaid plots of player I’s position as a function of time. It is apparent that 
there are two attractors: one is a periodic attractor around x=0.5 (Pareto point) and 
the other one is a fix point attractor around x=0.1 (Nash point). When they fail to 
generate particular models, players cannot stay in the Pareto point and return to the 
Nash point. 



4.2 Dynamics in the model space 

The situation is now analyzed by the model of the players’ internal dynamics. 
The periodic cycle underlying the co-operative motion is decomposed into three 
phases (Fig.3 ): climbing up the x = y line, climbing down the x = y line, and 
a phase of switching from the first phase to the second and back. 

After a series of rounds of alternating between 1 (backward) and O(forward) 
actions, each player comes to ‘believe’ that the other is a ‘Tit for Tat’ player who 
copies its opponent’s previous action. They then approach the Pareto efficient 
line by climbing up the x = y line. During this phase, players come to obtain the 
same model against each other as they go through the common experience (i.e., 
repeat the same action sequence and move along the same line) . In maintaining 
this image, however, both players cannot stay in the Pareto efficient line, but 
have to step back toward their home positions. A simultaneous step back motion 
is not included in a Tit for Tat algorithm. Therefore, they have to change their 
images. Complex images are then generated during the switching phase, wherein 
players, only on average, gradually climb down the x = y line. 

Here the complexity of the images is measured by the number of clusters 
generated in the context space plots. The second phase then emerges where 
players have “negative Tit for 2 Tats -like” images. Due to these images, at least 
two successive forward actions are needed to return to mutual co-operation. 
Then at a certain period of time, they step forward to the Pareto efficient line 
by recovering the “Tit for Tat” image. But when players fail to perform such 
processes, however, they cannot come back to the Pareto efficient behavior. It 
is important to have an image such as the negative Tit for 2 Tats in order to 
anticipate the future, which retains the hope of recovering mutual cooperation 
by means of particular action sequences. The context space plots of those images 
are shown in Fig.4. 
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(a) 



(b) 



Fig. 3. Time evolution of player I’s model of player 2 at the Pareto efficient attractor, 
(a) The number of clusters generated in the context space, which is computed by 
the box counting method with the minimum size 0.1. (b) Corresponding positions of 
player 1. When players start to step towards their home position, their internal models 
become diversified. Three phaises described in the text are marked in figure (b). This 
is a scale-up picture of Fig. 2 between x=0.45 and 0.5. 




Fig. 4. Context Space plots generated at the Pareto efficient attractor. They correspond 
to the states in Fig. 3 at time step = (a) 6012, (b) 6082, (c) 6117, (d) 6038, (e) 6043 and 
(f) 6058. The patterns of (a), (b) and (c) are relatively long-lived finite automaton-like 
images. In particular, the pattern (a) corresponds to Tit for Tat and (b) to negative 
Tit for 2 Tats in terms of the IPD game. The corresponding nodes of automaton are 
marked with circles. On the other hand, the patterns of (d), (e) and (f) are described 
as complex images in the text. They are images with more than 40 distinct islands of 
clusters. 
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It is worth comparing the above scheme with the iterated Prisoner’s Dilemma 
(IPD) game situations (Axelrod 1984; Ikegami and Taiji 1998). In the IPD game, 
playing C immediately implies co-operation. Therefore, the ‘Tit for Tat’ image 
becomes the fixed point of this coupled recognizer system with model unde- 
cidability . In the present case that is not true. You have to deceive others by 
stepping back to your home position, which will paradoxically sustain mutual co- 
operation. To do so, players have to temporarily hold complex, non-automaton 
images, while changing between two distinct finite automaton images. Tit for 
Tat and negative Tit for 2 Tats. This model switching behavior is a temporarily 
periodic dynamics; however, the underlying model images include the ‘strange 
attractor’ of the corresponding iterated functional system, i.e. a dynamical rec- 
ognizer. 

Similar findings have been reported in the navigation learning of mobile robot 
experiments conducted by Tani (1998). Using a recurrent network, a mobile robot 
incrementally learns that a given environment consists of walls, corners, and 
obstacles. The image the robot learns of the environment is characterized by the 
context space plot, as is the case here. The experiment demonstrated that image 
switching occurs between a simple finite automaton and a chaotic attractor. 
The switching phenomenon is due to the strong cross-coupling between a real 
world mobile action and an organizing recurrent network. The model switching 
in our case is due to the strong cross-coupling between two sometimes competing 
inputs: the other players’ action sequences and the spatial context. 



5 Conclusions and Discussion 

So far, we have demonstrated a new type of co-operation arising in the Ra- 
shevskyan game. A coupled dynamical recognizer has made it possible to simu- 
late dynamic co-operation among players who anticipate their future actions by 
making internal models of other players. The dynamic co-operation is sustained 
basically by switching between two finite automaton models and other complex 
ones. Inability to have a static and unique model of the other player is due to the 
context dependency of the internal models. It may be a counter-intuitive fact 
that players cannot make a spatially- independent model of the other player. 

How players can generate a context-free model may depend on the complex- 
ity of the game as well as on the network size used to generate the internal 
model. Anti-phase synchronization in action sequences between players emerges 
as a precursory phenomenon of this mutual co-operation phase, but without any 
intentions. 

Model undecidability is a key concept here. A situation occurs when players 
cannot decide in which direction to move. Classical game theory has made efforts 
to remove this undecidability. Here, due to the undecidability, some branched 
trajectories are attracted to the Pareto efficient behavior. At the same time, 
the model undecidability reveals the pragmatic paradox ( Bateson 1972; Rossler 
1994) embedded in the intelligent players. That is, only minute differences in 
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past action sequences or in spatial context can be interpreted as critical signals 
used to predict and plan future outcomes. 

From our series of works of coupled dynamical recognizers, a connection 
can be found between a kind of games played and the images of other player 
generated in other players’ mind. We believe that issues such as the cognitive 
development of early infants (e.g. how to get to know other people’s mind ) 
can be also elucidated using a minimal simulation model employing a coupled 
dynamical recognizer of the type described in this paper. 
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Abstract. In multi-agent systems, agents receive incomplete local information, 
and they must achieve global tasks. We consider this ability to determine the 
appropriate action to correspond exactly to "norms of behavior" in the human 
sense. In this study, we constructed a competitive social system consisting of 
selfish autonomous agents. Each agent had an independent evaluation table for 
their actions. The Agents’ strategies were adapted on the basis of their 
individual evaluations, and agents’ norms of behavior satisfy. In such systems, 
only the agents that adapt not only their strategy but also their norms of 
behavior to the environment were able to survive. 



1 Introduction 

This paper reports on our attempt to construct a model of a multi-agent system in 
which the agents interact on a basis of their selfish decision-makings. The main 
feature of the suggested model is that each agent can generate its own “norms of 
behavior”. The objective of our study was to clarify the behavior of a society 
constructed from agents that have the ability to change their “norms of behavior” 
adaptively, and to investigate the process of problem-solving. 

In this study, we used the Iterated Prisoner’s Dilemma (IPD) [4], a typical model of 
game theory, as the environment in which the agents would interact. Using simple 
alternative action selection mechanisms, the agents recognize both the conditions they 
should obey and the strategies of the other agents as their environment. Then a system 
in constructed in which the agents have proper norms of behavior. Computer 
simulations show that self-preservation and cooperative behavior are acquired 
adaptively. We consider that introduction of the independent norms of behavior (as an 
internal payoff matrix) to each agent leads to an ordered society from a social 
dilemma situation. 

In the original IPD games, all agents who interact with each other have the same 
norms of behavior. In contrast, in the suggested model, the agents have free norms of 
behavior. Because each agent processes the freedom to react to a single phenomenon 
with various norms of behavior, we expected various functions to emerge. Hyper- 
game analysis [5] is an example of this concept. The suggested model, however, is 
fundamentally different from that because it includes the adaptive searching by each 
agent for norms of behavior. 
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2 Social Dilemma on Multi-agent Model 



Each of the N -agents must decide whether to move a piece of the cargo at each time 
step. When an agent carries a piece of cargo, it expends energy (E). After that, all of 
the agents receive a reward ( Ri ) according to the number of pieces of cargo ( i ) that 
they have moved. Note that not only agents who worked but also agents who rested 
receive the same energy from the environment. Rewards are paid as energy. (Fig.l, 
Left) 

Energy Map for IPD Environment 

10 
5 
0 
-5 
10 

No. of Agents m4^o Cooperated 





Fig. 1. Prisoner’s Dilemma with Multi- Agent Model and Payoff Graph of N-Person IPD. This 
chart indicates that choosing “D” offers a better payoff than does choosing “C” even if the 
number of cooperating agents is decreased by the defection of the agent. This rationality 
corresponds to condition (1) of the Iterated Prisoner’s Dilemma. However, if all of the agents 
choose “D” based on this rationality, the payoff received by each agent is much smaller than 
that received if all of the agents cooperate. This dilemma corresponds to condition (2) above. 



Then, by representing the moving of a piece of cargo by an agent as “C” 
(Cooperation), and the failure to move a piece of cargo as “D” (Defection), the 
situation of this multi-agent system can be considered to correspond to the “Iterated 
Prisoner’s Dilemma”. That is to say, the payoff table of the system includes the 
following dilemma. 

(1) ; Whatever actions the opponents choose, the choice of “D” (Defection) always 
produces good reward. ( R(i -\)> Ri- E) 

(2) : the agents receive a better reward if they all choose “C” (Cooperation) rather 
than “D”. (/?A-£>0) 

The situation of the above-mentioned “Society Dilemma” [3], is extended from a 
one-on-one payoff table by using the payoff chart (Fig. 1 , Right) in which the number 
of cooperating agents is shown as the horizontal axis [2]. Two lines in the chart 
represent the payoff received by the defecting agents (thick line) and the payoff 
received by the cooperating agents (thin line). 



3 Structure of Competitive Society and Agents 

In the computational simulations, when i agents chose “C” at any step, the reward 
(energy) received by the cooperator is given as C, = Ri-E-E^, and D, = Ri - 
represents the reward received the agents that chose “D”. Where, Eg is the energy 
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required for existence. In this computational simulation, in order to maintain the 
population balance of the society. It is determined as Ef^ = (RN -E)I2 

Thus, agents compete for existence aecording to the following energy rules. 
Initially, agent a is given a certain amount of energy ( Eng^ ) for existence. The 

agents then indicate their intention to carry cargo according to their own strategy 
engines (see below). 

The environment (energy rules) provides energy as a reward for the agents based 
on their decisions act ^ = [C , D } ■ 



Eng^(t + \) 



{Engjt) + C- ifact^{t) = C 
\Eng„ it) + D. if act„ (1) = D 



( 1 ) 



If not enough agents cooperate, the amount of reward energy will be less than that 
of expended energy. When an agent has expended all of its energy, it is eulled from 
the society system, and a new agent with randomly determined parameters is 
introduced into the system. 

In this model, at the time step t , agent a has an independent payoff table , and 
the agent generates its action strategies st^j(t) according only to that table. 

Since each agent has a different action standard, in this model, we call each agent’s 
payoff table its “norms of behavior” in a human sense. As shown in Fig.2, the payoff 
table evaluates the action of agent in totally order according to the number of agents 
in the whole system that chose the action “C” at the same time. 

Number of Cooperator 

I ^ 

o 

< D 

Fig. 2. Norms of Behavior Matrix (Payoff Table T^)forlPD. represent 

the evaluated values of actions of an agent a . 
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The action strategies st^j that each agent stores independently are corded to 

strings based on the history of past actions of all the agents [6]. Those strategies are 
polished up by genetie operations (GA). Values in matrix are used as fitness of 

each strategy at every interaction. Fundamental behaviors of simple games between 
the agents with same payoff table (IPD norms) are researched in [1]. 



4 Computational Experiments 

The above-mentioned parameters R=l, N=10 and E=5 were used for 1000 generations 
of computational simulation of the comptetitive social system. The number of aetion 
strategies ( n ) in each agent’s gene pool is 20. The initial amount of energy 
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(£’ng^(0)) for each newly generated agent is 1000. The number of times that 



interactions are repeated in each generation is 50. 

The results of experiments performed using the above parameters are shown in Fig. 
3(Left). 




Fig. 3. Average Values of Agents’ Norms of Behavior for each Generation (Left) and 
Landscape of “dilemma” corresponds to the parametric space of multi-agent system. (Right) 



The averages of the values in each payoff table of all agents are plotted for every 
generation in Fig. 3 (Left). Until the 500th generation, the system was not stable 
because the agents couldn’t recognize or adapt to the environment (energy rules), so 
those agents were continually being replaced by new agents. After that, agents that 
could adapt to environment began to be accumulated within the system. From that 
point, no agents became extinct, and system was fixed. At that time, the highest 
acquired value on the norm of behavior table was ; i.e., the agents’ norms of 

behavior showed a preference for cooperating with each other. However, when each 
agent’s norms table was investigated, it was found that though eight agents thought 
highly of cooperative action (“C”) and worked hard, two agents received higher 
reward energy by defecting (“D”). That is to say, it is a society in which a group of 
sucker (cooperative) agents is dominated by a much smaller group of extortionate 
(defected) agents. 



4.1 Evaluation and Analysis of the problem Field 

In the above experiment, although a society in which the agents could achieve their 
objective was constituted by about the 500th generation, it is thought that the 
difficulty in determining what is the appropriate society depends on the structure or 
parameters of the environment. Fig. 3 (Right) shows the landscape of the average of 
search time in 20 trials for each value of R and E . That is, the average number of 
generations by which the system was fixed and a one agent society was acquired after 
the selection of agents with independent norms of behavior. However, the maximum 
searching time is set as the 5000th generation. 

Fig. 3 (Right) shows that the searching time landscape produced a ridge in the 
parameter space of R and E . The location of this ridge corresponds to the 
conditions of the dilemma, and the height represents the difficulty of the search for 
the solution to the dilemma. 
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On the other hand, in the two domains of the parameters that are separated by the 
dilemma ridge, the searching time needed to constitute the appropriate society was 
extremely short; i.e., there is a broad range of stable solutions corresponding to the 
combination of norms of behavior of agents, so search for society was easy. In the 
area in front of the ridge (the domain of a small R value and large a E value), the 
acquired society was lazy (“defecting” was the dominant strategy) because the 
environment with such parameters was advantageous to defecting. On the other hand, 
in area behind the ridge (the domain of a large R value and a small E value), the 
acquired society was composed of mainly hard-working agents. 

These results clarify the relation map between IPD and a multi-agent system in 
which the aim of the agents is to achieve a set task and for self-preservation. At the 
same time, these results also demonstrate the complexity of that relationship. 



5 Conclusion 

We attempted to construct and investigate a competitive society model consisting of 
agents with self-organized norms of behavior, against the complex requirement from 
environment. In this model, agents competed on the basis of game interaction 
according to only their own norms of behavior. 

The environment for this system consisted of rules of the acquisition of rewards. In 
this study, the Iterated Prisoner’s Dilemma model was introduced as the environment 
rules. In such an environment, the agent group that can recognize the IPD rules and 
cooperate with each other is self-organized in the computer simulations. At the same 
time, the agents form themselves into two groups of norms of behavior that show a 
kind of dominant relationship just as do real societies. 

We consider that the setting of independent norms of behavior enables the 
emergence of a society among the agents. Further, such a competitive social system 
model also implies that multi-agent systems can harmonize themselves to their 
environment autonomously through the interactions of the agents. 
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Abstract. With regard to the dynamics of games, a framework what 
we call “Dynamical Systems Game” model is presented, where the game 
itself can be affected and changed by the players’ behaviors or states. 
Relation between the game dynamics and the evolution of strategies is 
discussed by applying this model. Computer experiments are carried out 
for a simple model, to show the evolution of dynamical systems with 
effective use of resources. 



1 Main Issues about Game Dynamics 

In this paper, we would like to introduce a framework model to deal with the 
issues of “game dynamics.” In the real world, when we decide to select an action 
and carry it out, our behavior sometimes changes our own game environment. A 
change in the environment may also have an effect on a player’s decision-making 
process. Further, the utility of the same behavior sometimes varies according 
to an individual’s (or others’) current circumstances. In game theory[3], such 
situations are sometimes represented by one (large) game: that is, from the past 
into the future, all possible actions of all players at all points of time are taken 
into account. Thus all possible bifurcation patterns of the game are derived, with 
this situation as a whole depicted as one huge game-tree. In this way, we can 
project the course of time into a static game and analyze its solution in the form 
of a game-tree or a game matrix. “Strategy” here means the action plan for all 
points in time, and the analysis of the rational solution for a game is possible 
only when we know all the possible actions from the past to the future. It seems, 
however, that we ourselves do not always make such decisions even if they are 
theoretically possible. Here we would like to present another framework model 
whose formulation can naturally deal with the above issues. This is what we call 
the “Dynamical Systems (DS) Game model,” where the game itself is described 
as a dynamical system^. Dtailed description of the DS Game is given in [1]. 

^ This type of problems was first considered by Rashevsky[5] and Rapopport[4]. Fur- 
thermore, Rossler introduced the abstract model of multiply linked coupled au- 
tonomous optimizers[6], as a model for complex biological systems, in particular 
for brain. 




561 



2 Lumberjacks’ Dilemma as a DS Game 

Modeling As an application of this DS Game framework, we present in this 
paper what we call the “Lumberjacks’ Dilemma (LD) Game.” Let us consider 
the following situation: 

There is a wooded hill where several lumberjacks live. The lumberjacks fell the 
trees for their living. They can maximize their collective profit if they cooperate in 
waiting until the trees have fully grown before felling them, and sharing the profits. 
However, any lumberjack who fells a tree earlier will take the entire profit on that 
tree. Thus each lumberjack can maximize his personal profit by cutting trees earlier. 
If all the lumberjacks do this, however, the hill will go bald and there will eventually 
be no profit. This situation inevitably brings about a dilemma. 

This LD Game can be categorized into the social dilemma that deals with 
the problem of forming and maintaining cooperation in a society, which is repre- 
sented by the classical story “the tragedy of the commons” presented by Garret 
Hardin in 1968[2]. Its structure is logically similar to that of the Prisoners’ 
Dilemma Model if considered at the level of a static game. In other words, it 
can be represented in the form of an n-person version Prisoners’ Dilemma if we 
project it onto static games. Here we note several important differences. Dynam- 
ics of the size of trees should be expressed explicitly in this LD Game. The yield 
of a tree, and thus the lumberjacks’ profit, differs by the timing when the tree 
is felled. The profits have a continuous distribution, because the yield of a tree 
can have a continuous value. A lumberjack’s decision today can affect the future 
game environment through the growth of a tree. 

By formulating the above game concretely, though the details of modeling 
is ommited is here, we have analyzed the game dynamics of LD Games and in- 
vestigated the nature of evolutionary phenomena among players in LD Games 
based on computer simulations. From among the LD Games under various kinds 
of conditions (e.g. the number of players, the number of trees, the way of im- 
plementation of strategies, etc.), we present in this paper the LD Games of 
1-person and 1-tree as the introdion of DS Games, focusing on the relation be- 
tween game-dyanmics and the player’s decision where any confliction among 
players’ decisions does not exist. 

Experiments on 1-Person LD Games Let us consider a simplified model of 
a 1-person 1-tree LD Game. Here we make two simplifications. First, the player 
never refers to his or her state, that is, the player makes his or her decision only 
by referring to the size of the tree. Second, the player cuts the tree if the size 
of the tree exceeds a certain value, called the decision value, xa- We denote the 
size of the tree by x and the states of the player by y. Note that x and y are 
multidimentional vectors in case of n-person m-tree LD Games. The value of x^ 
uniquely decides the time series of the phase (x,y). The attractor of the time 
series can be a fixed point, periodic, quasiperiodic, or chaotic motion, which 
depends on the dynamical law given to the system. 

As in the ‘bifurcation diagram,’ we have plotted in Figure l-(a) the set of 
values that x takes in the attractor with the change of Xd- The figure gives a 
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Fig. 1. 1-person 1-tree mountainside-type LD Game — (a) AGS Diagram ; Change 
of the attractor with the strategy is plotted. A set of values of x at the attractor 
(all the values that x takes between the 200th and 400th rounds), is plotted with the 
decision value, Xd, given in the horizontal axis. For example, the two parallel straight 
segments around Xd = 0.8 show that the dynamics of x is attracted to the period-2 
cycle between the values around 0.3 and 0.6. (b) average score landscape: The average 
score from 0th round to 400th is plotted with the change of the decision value Xd- 
(c) complicated structure in the average score landscape of a evolutionary simulation: 
The average score landscape based on the decision-making function of the player who 
actually appeared in the 1-person 1-tree simulation (the 5848th generation) is shown. 
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diagram how the attractor of game dynamics changes with a parameter in the 
decision-making function. Let us call such a figure the AGS Diagram — the tran- 
sition of the Attractor of the Game dynamics versus the change of the Strategy). 
With the AGS Diagram, one can study how the nature of game dynamics shifts 
among various states (fixed point/periodic/chaotic game-dynamics, or produc- 
tive/unproductive game-dynamics, etc.) with the change in decision-making. The 
following two characteristics in Figure l-(a) are noted — (1) For each decision 
value, its corresponding attractor is always a periodic cycle. (2) There are infi- 
nite numbers of ‘plateaus,’ in which the attractors are completely the same over 
some range of decision values (For examples of such plateaus, see the period-2 
and 3 plateaus in Figure l-(a)). 

Corresponding to Figure l-(a), the average score that the player gets during 
the attractor part of the game dynamics is plotted in Figure l-(b), with the 
change of the decision value x^,. We will call this diagram average score landscape. 
The optimal decision value seems to exist in the period-2 plateau around 0.8 as 
far as we can see in the scale of this figure. Consequently, the best strategy 
for the player seems to construct this period-2 dynamics, which was indeed 
observed in the early stage of the LD Game simulation shown later. However, by 
examining the close-up of the left-hand side of this period-2 plateau of Figure 1- 
(a), (indicated by the arrow with (L)), infinite numbers of plateaus are found to 
accumulate there and they form a fractal structure (so-called devil’s staircase), 
where more profitable dynamics than period-2 exist, among which the period-11 
is the optimal (productive) for the player. 

As for the 1-person “line-type” LD Game, the AGS Diagram shows that the 
dynamics is attracted to a quasiperiodic motion if Xd < 2/3, otherwise to a 
periodic motion. The average score landscape shows that the landscape is not 
stepwise but a monotonically increasing straight line for xa < 2/3, and that the 
optimal decision value lies in xj. > 0.75 for this 1-person line-type LD Game. 



Simulations on Evolutionary 1-Person LD Games An LD Game in general 
has several trees and several lumberjacks. Here, let us consider an evolutionary 
LD Game with a single lumberjack as the simplest example. Note that there 
exists a dilemma-like situation even in 1-person LD Games (although the mean- 
ing of ‘dilemma’ is different from that of 2-person games). By cutting the tree 
one always gets a higher score at the next step, which is not usually good for 
a long-term fitness. Here we need a strategy that takes into consideration the 
dynamics of the game. 

In Figure 2-(a)(b) (each of which is called “fitness chart”), the fitness that the 
fittest species of each generation (“generationarily fittest species”) gets, which 
we call the “fittest value,” is plotted with generation. In a fitness chart, the 
horizontal axis shows the generation, while the vertical axis shows the fittest 
value of each generation. Figure 2-(a)(b) show that the fittest value increases 
monotonically and stepwise with generation. This ‘monotonic increase’ is rather 
natural, since in a 1-person game, the fitness is determined solely by its strategy. 
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Fig. 2. “Fitness chart” in a 1-person LD Game: The fitness chart of a mountainside- 
type LD Game is shown in (a) up to the 10th generation, in (b) for 10-70 generation. 
In all the figures, the horizontal axis shows the generation. The vertical axis shows the 
“fittest value” of each generation, which is the fitness value of the fittest species of 
each generation, (c) The fitness chart of the 1-person line-type LD Game from the 1st 
generation to 70th is plotted with a fitting ciurve by exponential function. 



The dynamics of the player’s action and of the states of both the player and 
the tree observed in this mountain-side type LD Game is attracted to periodic 
cycles. For example, the period-2 dynamics is shared by the generationarily 
fittest species over long generations, where the evolution of dynamics is seen 
only in the transient before the dynamics falls onto the cycle. It lasts from the 
49th to the 3605th generation when the management of the period-7 dynamics 
is achieved. In the later evolution, a new dominant species of period-23 appears 
at the 5848th generation and that of period-11 (the optimal attractor of game 
dynamics for the player) appears at the 8984th generation. The time required for 
the evolution of the periodic pattern gets longer at later stages. This is because 
the evolution of the species is likely to be trapped by meta-stable states, as is 
seen by the average score landscape given in Figure l-(c) that is drawn based 
on the decision-making function of a player who appeared in the simulation of 
1-person 1-tree LD Game. 

As for the simulation about the 1-person line-type LD Game, the optimal 
game dynamics is easily realized as early as 45th generation. Fitness chart Fig- 
ure 2-(c) shows that the fittest value increases gradually but not step by step 
with generation. Such smooth and quick evolution obviously depends on the 
structure in the AGS Diagram and consequently on the average score landscape. 
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3 Discussion 

Let us discuss general advantage of the DS Game model over other models. 
DS Game modeling is suitable to study the evolution/learning of the decision- 
making subjects living in the world that can be described by the dynamical 
system. Models of game theory certainly have strength in dealing with the is- 
sues of decision-making subjects who interact with each other, but they are not 
congenial to dynamics by nature, and so, it cannot touch upon the issues that 
can be studied only at the levels of dynamics. For example, the devil’s stair- 
case in the AGS Diagram leads to stepwise innovation in the mountainside-type 
LD Game, while the evolution is gradual in a simple dynamic structure in the 
line-type LD Games. In a 1-person games, the game of the line-type version has 
the same structure as the mountainside-type one from the static point of view, 
in that a player who cuts the tree as early as possible can take the short-range 
profit, but loses the long-range gains. However, the fundamental difference ap- 
pears at the level of game dynamics and strategies’ evolution (At the level of 
plural-person games, the present modification keeps the common social dilemma 
that exists in the previous verion). Thus, several different DS Games can be 
categorized into the same static game if modeled by the traditional game theory, 
but there is also a possibility that these games have completely different nature 
at the level of the dynamical structure, especially when evolution or learning is 
involved. 
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Abstract. Field Programmable Gate Arrays (FPGAs) can provide the 
most suitable circuits for given problems by reconfiguring its circuits. 
In this paper, we show that a FPGA chip can achieve about 120 times 
of speedup compared with a workstation (Ultra-Sparc 200 MHz) in the 
computation of a co-evolution of strategies and scores in Iterated Pris- 
oner’s Dilemma game. This speedup makes it possible to challenge more 
complex problems beyond the limitation by software. 



1 Introduction 

Field programmable gate arrays (FPGAs) can be reconfigured for each problem 
by down-loading configuration data from memory (ROM) or computers. Thus, 
systems with FPGAs can provide the most suitable circuits for given problems. 
In this paper, we show that a FPGA chip can achieve about 120 times of speedup 
compared with a workstation (Ultra-Sparc 200 MHz) in the computation of a 
co-evolution of strategies and scores in Iterated Prisoner’s Dilemma (IPD) game. 
By processing IPD games in parallel with pipelined circuits implemented in a 
FPGA chip, the computation time of the model for a initial state can be reduced 
from several days to about an hour. This drastic speedup makes it possible to 
study behaviors that appear over a large number of generations with a large 
number of agents [6]. 

The co-evolution model consists of two layers. In the first layer, scores for 
IPD games are evolved using a genetic algorithm. Scores vary within the range 
of Prisoner’s Dilemma game, and scores that attract more agents in the sec- 
ond layer gradually increase. In the second layer, agents play IPD games with 
all other agents following the scores that they believe and evolve using Lind- 
gren’s model[l, 2]. With this model, we can observe how cooperative behaviors 
and scores emerge, which may lead to the understanding of emergence of social 
morals. 



2 Hardware Computation 

In evolutionary computations, same sequences of operations are repeatedly ap- 
plied to a large number of individuals (agents). With dedicated hardwares, the 
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sequences can be applied to all individuals in parallel, and each computation of 
the sequences can be pipelined. The expected speedup by hardware is N (number 
of agents) x P (speedup by pipeline processing, which is almost equal with the 
depth of the pipeline when the data sequences are long enough). However, dedi- 
cated hardware systems with ASICs can not deal with many problems, because 
details of the sequences depend on given problems and vary considerably. 

Field programmable gate arrays can be reconfigured for each problem. Thus, 
systems with FPGAs can provide the most suitable circuits for given problems. 
The speed and size of FPGAs are drastically improved recently, and several 
systems show very good performance in evolutionary computations[3, 4, 5], In 
evolutionary computations, the size of FPGAs is especially important, because, 
as described above, the speedup by hardware is proportional to the number 
of agents processed in parallel. The size of LSIs (size of FPGAs, off course) 
is steadily being improved compared with the improvement of the operation 
speed. Therefore, the speedup by hardware computation will increase, as the 
LSI technologies progress. 

3 Co-evolution of Strategies and Scores 

3.1 Co-evolutioii model 




Fig. 1. Co-evolution of Strategies and Scores 



In this section, we describe the details of a co-evolution model of strategies 
and scores. Figure 1 shows the overview of the model. The scores in the payoff 
matrix (3-l-7i, 0+/?,-, 5+a,-, 1+^i) for IPD games are evolved using a simple 
genetic algorithm. Scores vary within the range of Prisoner’s Dilemma game, 
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and a score that attracted more agents in the second layer gradually increases. 
In figure 1, agents A, B, C and F follow Score-2, and no agent follows Score-3 
(each agent chooses one of scores or not to follow any score at random). The 
fitness of the scores is calculated using the total points gained by IPD agents 
that follow the scores. Therefore, Score-2 will increase in the next generation, 
and Score-3 may be deleted (the agents that have followed the Score-3 choose 
new scores at random). 

IPD agents get their points by playing IPD games with all other agents using 
the scores they follow, and evolve using Lindgren’s model[l, 2]. When an agent 
does not follow any scores, default points are used. 

With this model, we aim to get rid of subjective interpretation of behaviors 
of agents, and we can observe how cooperative behaviors and scores emerge. 



3.2 Scores 

Table 1 shows a payoff matrix, a,-, /^j, 7 ; and 6i are parameters, and by changing 
the values of these parameters, categories of games becomes one of the follows. 



T > P> R> S 


Deadlock 


(1) 


T> R> P > S 


Prisoner' s Dilemma 


(2) 


T> R> S > P 


Chicken 


(3) 


R>T> P> S 


Stag Hunt 


(^) 



In our evaluation, we fixed the value of 7 ,- and 8i to zero, and a, and j3i vary 
within the range of (5) and ( 6 ), in order to satisfy the requirement by Prisoner’s 
Dilemma game ((2) and (7)). An agent gets more payoff by defecting another 
agent when a is larger, and has a more bitter experience by being defected when 
/3 is smaller (close to —1). 



0 < < 1 
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-I < iJi < 0 
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S + T <2R 
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Table 1. Payoff Matrix 
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Temptation 
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3.3 Strategies 

Strategies (agents) evolve using Lindgren’s model. In Lindgren’s model, each 
agent decides next move according to five previous moves. Thus, each agent 
ha.s 32 bits table (at maximum), and the values and the length of the table 
are mutated. Noise is assumed in the communication between the two agents. 
Therefore, a move by an agent are not always reported to another agent correctly. 

In our model, agents get their payoff using the payoff matrix (scores) that 
they follow. In initial state, no agents believe scores, and default payoff matrix 
(fv and (3 are zero) is used. By mutation, agents begin to follow one of the scores, 
and stop following the scores. 

3.4 An Example of Co-evolution of Strategies and Scores 

Figure 2 shows the average points obtained by agents in this model. From gen- 
eration 0 to 20,000, various strategies appear and vanish. In generation 20,000, 
defective strategies begin to dominate the world, and are replaced by coopera- 
tive strategies around generation 27,500. These drastic changes are caused by 
emergence of defective and cooperative strategies by mutation. 




Genaration 



Fig. 2. Average Points of Agents 



Figure 3 shows the distribution of the distances of the scores (20 scores are 
used in this evaluation). The maximum distance of two scores is v/2, because 
a and j3 vary in the range (5) and (6) respectively. In figure 3, the distribution 
of the initial phase (from generation 0 to 2500) is very large. In this phase, 
many kinds of strategies are created by mutation and can survive, because the 
strategies are not evolved enough. However, the distribution becomes smaller as 
strategies are evolved, and scores with larger a (close to 1.0) and larger (3 (close 
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to 0) become dominant (around generation 2,500). This pair of a and /? gives 
an agent better point when the agent defects another agent, and gives smaller 
damage when the agent is defected. The distribution begins to increase around 
generation 24,000. As shown in figure 2, the world around this generation is 
dominated by defective strategies. The point obtained by two defective agents is 
1 as shown in table 1, and independent with a and /?. Therefore, the mutation 
in a and /? is neutral. During this phase, the distribution gradually increases. 
Around generation 27,500, the defective strategies are replaced by cooperative 
strategies suddenly. The distribution continues to increase in spite of the change 
becanse the point obtained by two cooperative agents is also independent with a 
and 0. Around generation 30,000, some strategies that sometimes defect another 
agent begin to appear, and the distribution begins to decrease. 




Fig. 3. Distribution of Scores 



Figure 4 and 5 show the positions of scores of generation 20,326 and 28,343 
on a and 0 plain. In figure 4, all scores gather around a point (larger a and 
larger 0 as described above), while scores are dispersed in figure 5 because of 
the neutrality of mutation in a and 0 in the cooperative world. 



4 Hardware Implementation 

4.1 Parallel Processing of IPD games 

The size of a FPGA chip is not large enough to implement the computation of 
the whole model. Therefore, we have to decide which parts should be computed 
by hardware. The most time exhaustive part of the model is the computation 
of IPD games. The order of the computation is Nagents x Nagents (Nagents is 
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Fig. 4. Location of Scores (Generation = 20326) 




alpha 



Fig. 5. Location of Scores (Generation = 28343) 



the number of agents) because all agents play with all other agents. The orders 
of computations of other parts are at most Nagents or ^scores {^scores is the 
number of genes used in the genetic algorithm). Nagents and Ngeores are 1024 (at 
maximum) and 20 in the current evaluation, respectively. Therefore, all hardware 
should be used for the computation of the IPD games, and other parts can be 
processed by software without decreasing the performance. 

In the current hardware implementation, 24 games (48 agents) can be pro- 
cessed in parallel with a FPGA chip (ALTERA EPF lOKlOO). In order to process 
more agents, we need external memories to store whole agents, and have to ex- 
change when all the games between 48 agents finish. The time for exchanging 
agents is also negligible compared with computation time of IPD games. 
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4.2 Hardware for a IPD game 

Figure 6 shows the hardware for a IPD game. In figure 6, next moves of two 
agents are computed in one clock cycle, and the payoff of both agents are 
computed and accumulated in the next two clock cycle. These two phases are 
pipelined into three stages (adders are pipelined into two stages in order to 
achieve high operation speed), and we can finish N times repetition in TV + 2 
clock cycles as shown in Figure 7. In the hardware, the value of a and /d are 
represented as fixed point variables in order to simplify the adders. 

Two noises are also computed in parallel using pseudo random number gener- 
ators. In order to realize high quality (long frequency) of pseudo random numbers 
without slowing down operation speed, adders in the pseudo random number 
generators are also pipelined. 
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Fig. 6. Block Diagram for a IPD game 
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Fig. 7. Pipelining Processing of IPD games 
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4.3 Details of the Circuits Deciding Next Move 

Figure 8 show details of the circuits for deciding next move. In figure 8, the 
strategy of an agent is implemented on registers, and one of the bits is selected 
according to the history. The history is also implemented on registers (shift 
registers), and updated every clock cycle. Noise is implemented using a pseudo 
random number generator and a Ex-OR gate. 
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Fig. 8. Details of the Circuits Deciding Next Move 



4.4 Scheduling of the IPD games 

Figure 9 shows how all combinations of games among the agents are realized in 
the hardware. In the hardware, it is very important to reduce connections among 
the circuits in order to implement more units in a single chip. In figure 9, boxes 
are the circuits for IPD games, and strings in the boxes mean initial positions of 
agents. All combination of the games are realized by only connecting each units 
serially and shifting the agents among them. The initial positions of the agents 
are decided by computers in advance, and given to the hardware. 



4.5 Results 

Table 2 shows the computation time of all combination of games for 48 agents. 
Speedup by a FPGA is about 120 times compared with Ultra-Sparc 200 MHz. 
This high speedup is achieved by parallel processing of IPD games, and pipeline 
processing of a IPD game. 

5 Conclusion 

We show that a FPGA chip can achieve more than a hundreds times of speedup 
compared with a workstation (Ultra-Sparc 200MHz) in the computation of the 
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Fig. 9. Scheduling of the IPD games 



Table 2. Simulation Time 





time(sec) 


speedup 


Sun Ultra-Sparc (200 MHz) 
FPGA (33 MHz) 


119.02 

0.98 


1.00 

121.45 



co-evolution model. This drastic speedup make it possible to reduce the com- 
putation time of one trial from several days to about an hour. The hardware 
algorithm is scalable, and performance can be improved in proportional to the 
amount of hardwares. The size of the latest FPGAs is more than twice larger 
than the FPGA we used in this paper. With a system consisting of several these 
FPGAs, it is possible to achieve more than a thousand times of speedup. We 
believe that this high speed computation of co-evolution of strategies and scores 
will help to understand the emergence of social morals in the real world. 
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Abstract. This paper presents a flexible probabilistic model for the 
description of aggregation processes in autonomous collective robotics. 
Two different experiments are considered; one carried out by the au- 
thors, the other by Beckers et al. [1] with teams of reactive autonomous 
robots which differ from a morphological as well as from a control point 
of view. Rather than simulating robots moving within an environment, 
the probabilistic model represents the clustering activity as a sequence of 
probabilistic events during which cluster sizes can be modified depending 
on simple geometrical considerations and robot control parameters. It is 
shown that, for both considered robotic platforms, the evolution of the 
cluster sizes is perfectly described, both qualitatively and quantitatively, 
by the probabilistic model. By comparing the results at the model level, 
a better understanding is gained of the influence of the interaction ge- 
ometry and of the robot control parameters on the collective aggregation 
dynamics. 



1 Introduction 

Biologically inspired collective robotics favours distributed solutions, i.e. solutions 
where coordination is not taken over by a special unit using private information 
sources, or concentrating and redistributing most of the information gathered 
by the individual robots. Inspired by the so-called collective intelligence demon- 
strated by social insects [2], bio-inspired collective robotics studies robot-robot 
and robot-environment interactions leading to robust, goal-oriented, and perhaps 
emergent group behaviours. 

Often, fully distributed control is combined with minimal robotic skills: 
robots are not able to communicate to each other, to plan their activity or 
to adapt their behaviour continuously. With such simple controllers, a gathering 
task becomes essentially a geometrical problem with a probabilistic nature which 
is well adapted to be described by simple probabilistic models. 

The motivations for such a modelling are two-fold. Firstly, because of its 
minimalist essence, it enables the investigation and the determination of which 
characteristics of the experiment are most influential on the clustering process. 
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Secondly, working with probabilistic simulations means time saving. In partic- 
ular, it would be interesting to dispose of a tool which allows the evaluation of 
critical characteristics of an experiment before the robot’s final design is accom- 
plished or before a much more complicated sensor-based simulator is developed. 

Collective distributed clustering of spread objects is inspired by studies of 
aggregation processes with social insects, such as ants for instance [3]. In [1,5, 
7], similar experiments were carried out with real robots and reactive control 
architectures. In the former two papers, a precise statistical analysis was carried 
out but neither a modelling of the experiment nor a comparison with simulation 
results were presented. In [7] we presented a probabilistic model and we com- 
pared its predictions with data delivered by Webots, a 3D simulator of Khepera 
robots [8]. 

This paper firstly presents new results with real robots over long experiments 
which match perfectly with the predictions reported in [7] and, secondly, it shows 
how the probabilistic model can describe an experiment with a completely dif- 
ferent robotic platform such as the one used in Beckers et al.’s experiments [1]. 

2 Materials and Methods 

This section presents the two experiments at the experimental and at the mod- 
elling level. We focus on the experiment with the Khepera robots and we use 
the experimental data of Beckers et al. to demonstrate that our model is valid 
for a completely different set-up. The relevant differences between both set-ups 
are shown in Tab. 1. 





a) b) 

Fig. 1. a) Khepera equiped with gripper, b) Robots used by Beckers et al. (photo 
courtesy of Prof. J.-L. Deneubourg) 



2.1 Experimental Set-Up 

Experiments with Khepera. All Khepera robots [9] used in the experiments 
are extended with a gripper module (see Fig. la) and equipped with an IR 
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a) b) 



Fig. 2. a) Seed scattering at beginning of the experiment and b) at the end of one of 
the longest experiments (after about 2:05 h) 





a) b) 

Fig. 3. a) Example of aggregation process with 3 robots in Webots. Seed scattering at 
beginning of the experiment and b) after about 4:15 h of simulated time 



enhanced reflecting band (not shown in Fig. la). This band avoids that mov- 
ing robots can be recognized as objects to grasp and increases the robot-robot 
avoidance distance (see Tab. 1). The same robot configuration can be obtained 
using the Webots simulator [8]. Webots is based on the ”as realistic as possi- 
ble” reproduction of the sensor capabilities as well as of the robot-robot and 
robot-environment interaction kinematics. As term of comparison, the mean ac- 
celeration ratio for this experiment between Webots and real time is about 15 
on a workstation Ultra Sun 1 with five robots. 

We can summarize the robot’s behaviour with the following simple rules. The 
robot moves on the arena in a straight line looking for seeds. When its sensors 
are activated by an object, the robot starts a discriminating procedure. Two 



Table 1. Comparison between Khepera’s and Beckers et al.’s set-ups. All geometrical 
dimensions are given in [cm]. Beckers et al.’s robots are equipped with a 17 cm wide 
C-shaped gripper (*), see Fig. lb 





Arena 


Seeds 
0 h nb 


Robots 
dim nb 


Detection range 
seeds walls robots 


Initial scattering 


Khepera 
Beckers et al. 


80x80 

250x250 


1.7 2.5 20 
4 2.5 81 


5.50 1-10 
17x21 1-5 


1.7 1.7 6.5 

8.5* 20 20 


arbitrarily irregular 
regular grid 
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cases can occur: if the robot is in front of a large obstacle (a wall, another robot 
or an array of seeds), the object is considered as an obstacle and the robot avoids 
it. In the second case, the small obstacle is considered as a seed. If the robot is 
not carrying a seed, it grasps the seed with the gripper; if the robot is already 
carrying a seed, it drops the seed it is carrying close to the one it has found; 
then, in both cases, it resumes looking for seeds. Note that, because only the two 
extreme seeds of a cluster can be identified as seeds (in opposition to obstacles) 
by the robots, clusters are build in lines (see Fig. 2 as an example). 



Beckers et al.’s Experiments. In [1], each robot is equiped with IR sensors 
for obstacle avoidance and a C-shaped gripper. The gripper is provided with a 
micro-switch which is activated when a certain number of pucks are pushed. 

We can summarize the robot’s behaviour with the following simple rules. 
The robot moves on the arena in a straight line looking for pucks and avoiding 
obstacles (wall or teammates). If the pucks pushed by the gripper are three 
or more, the micro-switch triggers a puck-dropping behavior: the robot leaves 
the pucks on place and resumes the searching behaviour. Notice that, because 
of the sensorial orthogonality in the distinction between pucks and obstacles, 
cluster are recognized and accessible from any angle. The resulting cluster form 
is therefore approximatively circular. 



2.2 Probabilistic Model 

The central idea of the probabilistic model is that instead of simulating robots 
moving within an environment, their activity is represented as a sequence of 
probabilistic events. Robots could therefore be seen as dice being thrown into 
the arena at each iteration, with their random location as well as their current 
state (i.e. carrying zero, one or two objects), determining their next state and the 
next state of the environment (i.e. the state of the clusters). The model takes into 
account physical sizes (arena, robots, and objects), the geometry of robot-robot 
and robot-environment interactions, the time needed to manage them,and the 
sensor range for detecting objects, walls or teammates. An interesting feature of 
this method is that the probabilistic model is easy to implement because it is 
closely related to the flowchart of the robot controllers. 

In the considered aggregation process, the transition probabilities among 
the different states of each robot controller are conditioned by three (four for 
Khepera experiments) stochastic processes. The first stochastic process assigns 
a random position to the robot. If this position is inside the detection area of a 
cluster, the second random process is started. According to the robot state, the 
size of the found cluster is incremented or decremented by one or two objects if 
the number delivered by the second random process is within the incrementing 
or decrementing region (calculated with the values of Fig. 4c and Fig. 5c). Notice 
that Khepera and Beckers et al.’s robots differ not only from a morphological 
but also from a sensorial and therefore behavioural point of view. On one hand, 
when Khepera notice an object, it always rotates on place till the object is in 
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Fig. 4. Khepera (scale 1:10). a) Geometrical representation of the cluster increment- 
ing probability. The ratio between the identification perimeter {light grey arc) and the 
total detection perimeter of the cluster represents the probability to increment the 
cluster size by one seed, b) Geometrical representation of the cluster decrementing 
probability. The robot, in order to decrement the size of the cluster by 1 seed, first 
has to detect the seed on the cluster tip as in Fig. 4a and then grasp it (the angle 
from which a seed can successfully be grasped from a cluster is slightly smaller than 
its detection angle, see dark grey axe in Fig. 4b). c) The above mentioned geometri- 
cal considerations are translated in probabilities of incrementing or decrementing the 
cluster size given the state of the robot (carrying or not a seed), once the cluster has 
been found 




Fig. 5. Beckers et al. (scede 1:20). a) Geometrical representation of the cluster incre- 
menting probability. The ratio between the grey areas and the whole detection area of 
the cluster represents the probability to increment the cluster size by two pucks {very 
light grey) or one puck {light grey), b) Geometrical representation of the cluster decre- 
menting probability. The ratio between the grey areas and the whole detection area of 
the cluster represents the probability to decrement the cluster size by one puck {dark 
grey) or two pucks {very dark grey), c) The above mentioned geometrical considerations 
are translated in probabilities of incrementing or decrementing the cluster size given 
the state of the robot (pushing one puck, pushing two pucks or unloaded), once the 
cluster heis been found 
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front of it. It is able to accomplish this behavior because it is equiped with a 
belt of IR sensors. As a consequence, the modifying probabilities are calculated 
from perimeters of clusters. On the other hand, this behavior is not accomplished 
by Beckers et al.’s robots because the gripper is only able to measure the pres- 
sure exerted by the pucks but not the contact point. As a consequence, these 
robots access the clusters from any point and any angle and we have to consider 
surfaces rather than perimeters for calculating the modifying probabilities. The 
third stochastic process, the interference with other teammates, can can occur 
during the search as well as during object pick up or drop activity. If the ran- 
dom position assigned to the robot is inside the detection area of one of the 
teammates, the robot’s activity is frozen for a given number of iterations corre- 
sponding to the real time lapse needed for avoiding the teammate. The fourth 
stochastic process, implemented only in the Khepera experiments’ modelling, 
takes into account the object distinguishing efficiency (0.89) of the implemented 
controller. Each random process is carried out for each robot independently be- 
fore the next iteration of the program is started. 

In order to convert the number of iterations into time, we consider the time 
needed by a robot to sweep the detection area of a single seed (equivalent to 
an iteration) and calculate a fixed conversion factor (F) as follows (see [7] for 
further details). Adetobject = detection surface of a single object, IFrobot = robot 
width, Urobot = mean forward velocity of the robots. 

F = [s/iterations] (1) 

UrobotkFrobot 

As term of reference, the mean acceleration ratio between the probabilistic 
simulation and real time is about 4000 on a Ultra Sun 1 with five robots. 



Probabilistic Modelling of Experiments with Khepera. The cluster mod- 
ifying probabilities for Khepera are depicted in Fig. 4. Every robot can increment 
or decrement the size of a cluster by one seed at a time. With the used numerical 
values (Adetseed = 20.4 av?, Urobot = 8.0 cm/s, Wrobot = 5.5 cm) the resulting 
conversion factor is F = 0.46 [s/iterations]). The conversion factor is also used 
for taking in account the duration of the actions of the robots. For instance, with 
the implemented discriminating behavior, it takes 3 s for avoiding obstacles and 
10 s for modifying the size of a cluster. The algorithm translates these time 
lapses into number of iterations during which the searching behavior is frozen. 



Probabilistic Modelling of Beckers et al.’s Experiments. The cluster 
modifying probabilities for Beckers et al.’s robots are depicted in Fig. 5. Every 
robot can increment or decrement the size of a cluster by one or two pucks at a 
time. With the used numerical values (Adetpucks = 12.6 cm?, Urobot = 27 cm/s, 
IFrobot = 17 cm) the resulting conversion factor is F = 0.027 [s/iterations]). The 
robot takes about 0.5 s to avoid an obstacle and about 1.6 s to modify the size 
of a cluster. 
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3 Results 

We have carried out several sets of experiments with different number of robots 
and durations (see Table 2). The plots show the mean over all the runs and error 
bars represent the standard deviation around the mean. 



3.1 Experiments with Khepera 

All experiments are carried in the three different implementations (real robots, 
Webots simulator and probabilistic model) and different group sizes, except for 
the longest experiments which have been realized only with a group of three real 
robots. It is worth to notice that the about 18 hours needed for all the runs of the 
long experiments with real robots have been performed without any recharging 
break thanks a special tool [6, 7] which allows the robots to be supplied from the 
floor (see Fig. 2). 




time (ala) (iiac(mlB) tbac|nln] 



a) b) c) 

Fig. 6. Aggregation evolution with increctsing number of robots (one to five), a) Results 
of the experiments with real robots, b) Results of the Webots simulator, c) Results of 
the probabilistic modelling 



In order to check whether or not there is a significant difference between data 
collected from the simulations and the real experiment, we performed a Mann- 
Whitney test [4] on the distributions of mean cluster size at the end of the shorter 

Table 2. Characteristics of the experiments carried out. mes = mean size of the clus- 
ters, bes = size of the biggest cluster, nc = number of clusters 



Set-up 


Robots 


Duration 


Nb of repetitions 
reed Webots prob. model 


Measurement 


Figures 


Khepera 
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16 min 


5 
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6 


Khepera 


3 
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5 






msc, bes, nc 


7 
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Beckers et al. 
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Fig. 7. Evolution of aggregation process during ten hours. The results of real robots, 
Webots simulator and probabilistic modelling are overlapped in the same plot. Notice 
that the resulting plots axe the average of a different number of runs with different 
sample times (15 minutes for real robots, 2 min for Webots and probabilistic modelling). 
In order to obtain a plot with real robots extended over a pre-established, wide time 
window, we recorded the aggregation noise during two hours after a single cluster arose. 
This allowed us to stop a given run once all seeds were gathered in a single cluster and 
to extend the run data with the recorded noise 



experiments and on the time needed to gather all seeds in the longer experiments. 
With the help of this non-parametric test, we compared the distributions of pairs 
of data sets. The results show that there is no statistically significant difference 
(p<0.05) over 27 data sets except in three cases (Fig. 6, one robot, Webots vs. 
prob. model and three robots, real robots vs. prob. model; Fig. 8a, nine robots, 
Webots vs. prob. model). 

Fig. 6 shows the initial clustering evolution for a group of one to five robots. 
Although the results of both simulations, Webots and probabilistic model, are 
slightly smoother than those of the real ones (they are namely the average of 
a larger number of experimental replications), the three plots present a good 
agreement. 

Fig. 7 illustrates longer experiments with a group of three robots. Once again, 
both the sensor-based simulation and the probabilistic model are in good agree- 
ment. Real robots data show a slightly faster evolution of the aggregation process 
but this is mainly due to the reduced number of runs in comparison to simula- 
tion. The mean real robot data are in any case within the mean value extended 
with the standard deviation of simulation data. 

Fig. 8a shows a comparison of Webots simulator and probabilistic model 
based on mean and variance of the time needed by the robots to gather all the 
seeds in a single cluster. 



3.2 Beckers et al.’s Experiments 

The comparison between the results of our probabilistic model and the ones 
reported in [1] is shown in Fig. 8b. There is a good agreement between both 
results based on mean and variance of the time needed by the robots to gather 
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b) 

Fig. 8. Comparison between the results of simulations and real robots on the time 
needed to gather aJl the objects in a single cluster, a) Khepera experiments, b) Beckers 
et al.’s experiments 



all the pucks in a single cluster. However, we were not able to perform any 
statistical test because single run data were no more available. 

4 Discussion and Conclusion 

This paper presented a simple and flexible probabilistic model which has been 
applied to two completely different robotics platforms. The good agreement of 
the clustering dynamics described by the probabilistic model with data collected 
with real robots or more sophisticated simulators such as Webots shows that the 
proposed minimalist model incorporates the essential characteristics of the clus- 
tering problem. These characteristics have been identified to be probabilities of 
modifying the size of clusters and probabilities of having interferences with other 
robots. These probabilities are essentially based on geometrical considerations 
and can be derived from the sensory capacity of single robots. Once these prob- 
abilities are established and interaction time lapses measured, the probabilistic 
model has the interesting feature of being a prediction tool of the same quality as 
a detailed sensor-based simulation, while being significantly simpler and faster. 

Another interesting feature of the model is that the identification of the 
primary characteristics of this particular clustering problem is a step forward 
towards the understanding of collective mechanisms underlying clustering in 
general. For instance with the Khepera set-up, the model has predicted, before 
the long experiments were carried out, that it was possible to gather all the seeds 
in a single cluster, if enough time is available. In fact, since clusters of isolated 
seeds are in an irreversible way eliminated during the aggregation process and 
since aggregation is enhanced by a positive building gradient (the incrementing 
probability is consistently greater than the decrementing probability, see Fig. 4), 
these predictions have been confirmed experimentally. This resulting building 
gradient is even more enhanced in the experiments of Beckers et ah: the greater 
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the cluster size, the more stable the cluster is. This mechanism, simply depending 
on the robot-environment interaction, speeds up the team performances (Beckers 
et al.’s robots are able to cluster 4 times more pucks in half the time needed by 
a Khepera team of the same size). 

Finally, the results show that in this kind of experiments, where the coor- 
dination between robots is essentially probabilistic, the data obtained present a 
high variance. One possible solution to reduce variance and increase coordina- 
tion while keeping the team control fully distributed, would be to introduce a 
form of explicit local communication (signalling or symbolic communication). 
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Abstract. This paper explores the application of stigmergy as a mechanism for 
the control and coordination of cluster building with 10 simple, autonomous, 
mobile real robots. Social insects collectively carry out extraordinary tasks 
using simple reactive rule sets with little or no memory and without recourse to 
internal world models over which they reason. Control and coordination of 
behaviour can be mediated externally through the environment. In particular, 
building tasks may employ the configuration of each construction phase to cue 
the next phase. This paper shows how, using physical robots, the deterministic 
clustering algorithm of Beckers et al [1994] can be modified to create a cluster 
of objects against the arena wall. The modification takes the form of either 
applying probabilistic rule selection or by altering the robot sensor morphology. 
These two approaches illustrate that equivalent qualitative emergent 
consequences can be generated algorithmically or by exploiting the domain 
physics. 



1 Introduction 

Social insects, such as ants, provide an existence proof of minimalist collective 
control mechanisms. They exhibit characteristics such as, decentralized control, self- 
organization, redundancy and indirect communication through the environment 
(stigmergy). This paper develops the idea that these minimalist collective 
characteristics could be attractive to engineers in the design and implementation of 
multiple robots. For example, such mechanisms may prove to be usable by designers 
of milli and micro-scaled robots. At such scales there may be considerable limitations 
on communication, sensing, mobility, computatation as well as power. It makes sense 
therefore to consider if there are exploitable characteristics arising from the tight 
coupling between the environment and the action repertoire of embodied agents. In 
particular this study explores collective building activity of many simple robots where 
the global task achieving behaviour is not explicitly represented, that is, no robot has 
an internal representation of the task. The task is achieved as the emergent 
consequence of the robots executing simple reactive rules triggered by local 
conditions. In this context the paper focuses on how stigmergy can be employed to 
control and coordinate the global building behaviour without direct communication. 
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The paper is organised as follows. Section 1 reviews the employment of stigmergy in 
some social insects and how it can be used for real robots. In section 2, the robots and 
the environment used in this work are described. Section 3 discusses the origins and 
canonical form of a puck clustering algorithm in real robots. Sections 4 and 5 describe 
the experiments on non-seeded clustering and present the results, which are discussed 
in Section 6. 



1.1 Stigmergy 

The concept of stigmergy was first introduced by Grasse [1959]. He described the 
indirect communication taking place among individual termites through dynamically 
evolving features of a structure. His work focused on the mechanism for nest 
construction where communication and subsequent decentralized control was 
mediated through the environment rather than through a process of direct 
communication. Bonabeau et al [1994] refer to stigmergy employing stimulating 
configurations which trigger the building action of a termite worker, which transforms 
the current configuration of the structure into the next. The new configuration may, in 
turn, trigger another action of the same or another termite to modify the structure thus 
creating the next configuration. Thus modification to the environment provides the 
cue for subsequent changes in behaviour and therefore communication is effected 
through the environment. Holland has suggested an interesting further classification 
of stigmergy into active and passive categories [Holland 1996]. However, the term 
'sematectonic' has been employed by Wilson [1975] to imply cue based stigmergy in 
the domain of construction. Bonabeau et al [1994] also use the term sematectonic in 
the description of the use of cue based stigmergy in the construction of comb lattices 
in social wasps. They use simulation to show how the state of the simulated nest can 
be used to trigger the next action. In this way no a-priori blue-print is required since 
the next stage of construction is driven by the current state of the construction. 
However, some researchers, such as Bonabeau et al [1994], do point out that there 
still are problems in understanding how action triggering stimuli are organized in 
space and time to ensure that global building behaviour is coherent. Downing and 
Jeanne [1988] noted that some (non-linear) building activites of wasps can be carried 
out at the same time but not necessarily performed in a set sequence. Such non-linear 
construction is more likely to involve a greater array of cues for an animal to filter 
simply because there are more potential construction sites. These multiple cues, which 
may change with time, may include re-evaluating particular characteristics of earlier 
construction such as petiole length in a wasp nest, environmental cues such as 
temperature and humidity, and cues from developing young such as the growth of 
larvae. 



1.2 Employing Stigmergy 

Stigmergy has been observed to effect indirect communication in experiments 
employing the picking up or placing down of objects. The removal or deposition of 
such objects can alter the consequent behaviour of other (or the same) animal or 
robot. In this way a form of global task control can be realised. In the examples below 
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a stigmergic mechanism is realized by employing the removal or deposition of objects 
which consequently 'controls' the global task achieving behaviours. 

Franks et al [1992], Franks & Deneubourg [1997] looked at the construction of 
peripheral walls accomplished by worker ants in Leptothorax colonies that nest in 
narrow slits in rocks. They showed in a computer simulation model how ants could 
use a simple self-organizing procedure based largely on bulldozing. Each builder, by 
pushing its building block into others, adds its work to existing structures. Bulldozing 
workers do not communicate directly but can communicate efficiently via the 
products of their successful labours. Beckers et al have incorporated these ideas into a 
group of puck pushing robots in which each robot pushes pucks into others thus 
adding its work to existing structures. 

Deneubourg et al [1991] presented in their paper on 'The dynamics of collective 
sorting; ant-like robots and robot-like ants' a simulation showing that simple agents, 
specified in terms which could equally well apply to ants or robots, could achieve two 
generic tasks of fundamental importance: the clustering of scattered objects of a 
single type, and the grouping and sorting of objects of two different types. For sorting, 
the agents needed to be able to sense the local densities of the different types of brood 
items. This was achieved by using a short-term memory augmented by recognition of 
the type of any brood item an ant was carrying. Clustering was the result of the 
mechanism operating on only a single type of item. In this way indirect 
communication through the environment was achieved and behaviour was 'controlled' 
by the consequences of cumulative deposition and removal of objects, that is, a 
stigmergic mechanism was employed. 

Altenburg & Pavicic [1993] demonstrated puck clearance in an arena with 6 lego 
robots but puck clustering was first achieved by Beckers et al [1994], who showed 
that puck clustering could be achieved with an extremely simple mechanism They 
used physical robots which were unable to detect whether or not they were moving 
any objects, which had no memory, and which could sense the local density of objects 
only as being below or above a fixed threshold. The clusters formed were always 
away from the arena perimeter are referred to as np-clusters (i.e. non-peripheral). 
Melhuish et al [1998] reported a further robot study which demonstrated that this 
technique could also be extended to the sorting of two objects (frisbees) with a more 
simple mechanism than proposed by the model of Deneubourg et al. This paper 
explores how indirect communication through the environment in the guise of 
stigmergy can be further exploited in tasks involving the clustering of objects. In this 
way, it is argued here that, 'artificial' self-organization emerges at a meta-level as a 
consequence of simple rule sets acting on the environment and the indirect 
communication (stigmergy), which results from those actions which change the 
environment and subsequently effect the actions or behaviours of the agents. The 
resulting global self-organized structure is therefore a product of the tightly coupled 
interactions between the agents and their environment and is therefore sensitive to the 
nature of the domain physics such as friction, compliance and texture. 

This paper shows how, using physical robots, the deterministic clustering 
algorithm of Beckers et al [1994] can be modified to create a cluster against the arena 
wall. The modification takes the form of either applying probabilistic rule selection or 
by altering the robot sensor morphology. These two approaches illustrate that 
equivalent qualitative emergent consequences can be generated algorithmically or by 
exploiting the domain physics. Both methods emphasise the importance of the tight 
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coupling between local minimalist actions executed by the robots and the 
environment. 



2 Materials and Methods 

The robots and experimental environment used in this study were designed to 
investigate a range of social insect behaviour, with particular emphasis being placed 
on building tasks. The design of the robot system was inspired by the work of 
Deneubourg et al [1990] on corpse clustering in the ant Pheidole pallidula and Franks 
and his collaborators on ant species Leptothorax tubero-interruptus, Leptothorax 
unifasciatus, and related species (Franks and Sendova-Franks [1992], Franks et al 
[1992]). These ants live in small colonies (typically with a few hundred members) in 
cracks in rock. This constrained environment means that they are behaviourally 
adapted to life in two dimensions. It is therefore possible to study the behaviour of a 
colony by providing it with a particular two dimensional habitat - the space between 
two glass slides - which allows unrivalled opportunities for observation and 
recording. 

Leptothorax is also peculiarly suitable as a model for robotic investigations of some 
collective behaviours, because current robots, which are wheeled, operate well in two 
dimensions but are extremely limited in their abilities to operate in the third 
dimension. Leptothorax building behaviours tend to involve the movement of single 
lumps of material (carborundum grit of regular dimensions is usually provided by the 
experimenters) which are placed next to other lumps rather than piled on top of them. 
A similar level of functionality can be achieved in robots by simply pushing and 
pulling building blocks (referred to as pucks but actually modified frisbees) around 
the floor, rather than lifting them and piling them up. 

The robots known as U-bots, were designed and built in our laboratory to provide a 
flexible and capable platform for a range of collective robot experiments in a 10m 
diameter arena. The design and operation of the robots are described in detail in 
Melhuish et al [1998]. 



3 The Canonical Rule Set 

Creating np-clusters of small pucks was achieved by Beckers et al [1994] with up to 
six robots and was repeated by Melhuish et al [1998] using 10 robots with larger 
pucks and non-deformable boundary walls. The rule set for np-clustering of Melhuish 
et al is taken as the canonical form as the basis for further modifications in the 
experiments described below. The canonical form may defined algorithmically as: 
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Rule 1: 

if (gripper pressed & Object ahead) then 
make random turn away from object 

Rule 2: 

if (gripper pressed & no Object ahead) then 
reverse small distance i.e. drop the puck , 
make random turn left or right 

Rule 3: 

go forward 



Fig. 1. The Canonical Rule Set 



4. Experiment 1: Unseeded Wall Clustering - Applying a Probabilistic Strategy 

The rules to create the single 'central' cluster described above were applied 
deterministically. It was reasoned that if the likelihood that pucks could be left at the 
wall could be increased in some manner then this could enhance the possibility of 
cluster formation at the wall. If the probability of dropping a puck against the wall is 
too high then it would be expected that all pucks would be deposited at the periphery 
but not necessarily in a single cluster. In contrast if the probability is too low then a 
central cluster is likely to form (a probability of 0.0 is the equivalent of the canonical 
3 rule clustering algorithm as shown in figure 1). It was reasoned that a 'sweet' spot in 
the probability continuum existed where sufficient pucks could be dropped which 
would have the effect of acting as a seed. At such a probability there would be a 
dynamic equilibrium balancing the tendency to drop pucks at the wall and capturing 
them such that they are recycled. At this 'sweet' spot a puck would be 'allowed' to 
remain against the wall for a sufficient time such that other pucks could be deposited 
against it. 

To effect the probabilistic approach to depositing a puck the canonical 3 rule 
clustering rule set was modified so that rule 1 included a stochastic element as shown 
in Figure 2. The modification of Rule 1 provided the opportunity for one of two 
behaviours to be executed. When a robot collided with a wall (or another robot) the 
scoop is depressed and the robot registers the presence of an object ahead. With a 
probability of p the 'normal' Rule 1 behaviour of making a random turn away from the 
object is carried out. Conversely with a probability of 1-p the puck is dropped since 
the robot will reverse a small distance and then make a random turn. 
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Rule 1: 

if (gripper pressed & Object ahead) then 
with a probability p 
make random turn away from object 
else (with probability 1-p) 
reverse small distance i.e. drop the puck , 
make random turn left or right 

Rule 2: 

if (gripper pressed & no Object ahead) then 
reverse small distance i.e. drop the puck , 
make random turn left or right 

Rule 3: 

go forward 



Fig.2. 3 Rule 'Stochastic' Clustering Algorithm 



4.1 Materials and Methods 

44 pucks were set out in a regular pattern. The overhead camera captured a frame 
every 5 minutes. The end condition was considered to be when a configuration of 40 
pucks (over 90%) were within a puck radius of each other. The following probability 
p values of the puck being retained were trialled 0.0, 0.5, 0.8, 0.88, 0.9, 0.95 and 1.0. 
The trials with real robots proved to be extremely time consuming and thus, initially, 
only one trial was conducted for each setting of p using 10 robots. More trials were 
then run at the 'sweet' spot. In some cases the trials were terminated when two large 
clusters had formed. From previous observations it was noted that such a 
configuration could remain for a very long period before one single cluster would 
eventually form. Such a configuration was therefore considered to indicate that a 
single cluster would be formed. 



4.2 Results 

Table 1 sets out the results for the experiment. As expected probabilities of 0 and 1 
produce either pucks around the periphery (Figure 3h) or a central cluster (Figure 4a). 
With p = 0.5 pucks are still strewn around the periphery (Figure 3g). However alp = 
0.8 and 0.85 a major cluster was formed at the edge but with some 15 singletons 
strewn around the periphery (Figure 3f and e). The size of the major cluster against 
the wall reached the acceptable final configuration of 40 pucks with p =0.88 (Figure 
3d). Probabilities of p = 0.9 and 0.95 produced two main central clusters after 5 hours 
and 2.5 hours respectively. From earlier experience and observation it was concluded 
that these would eventually form one large central cluster and the runs were halted 
(Figure 3c and b respectively). 

Thus the experimental evidence suggests that the 'sweet' spot lay very near p = 0.88. 
5 more trials were therefore conducted with p = 0.88. Of the extra trials 3 formed a 
single cluster at the edge (9hrs 25m, lOhrs 35m and 13hrs 10m) and two formed 
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central clusters; one with 40 pucks (6hrs Om) and the other with some 9 pucks strewn 
around the periphery (7hrs 35m). Thus of the total 6 trials 4 formed acceptable edge 
clusters, one formed an acceptable central cluster and one had nearly formed an 
acceptable centre cluster. 



4.3 Conclusions 

The results demonstrate that the stigmergic mechanism employed in the Beckers et al 
clustering work can be modified to control whether a single cluster is formed in the 
arena or at its edge. Low p produces no clustering but pucks strewn at the peripheral 
wall. High p produces central clusters and with an intermediate p a single cluster at 
the wall can be produced. This results are qualified in that not all trials were 
successful (4 out of 6) but the choice of p may need to be refined further and/or the 
very nature of the task, with its concomitant noise and randomness, cannot guarantee 
100% success. 



Retain 
Prob p 


RESULTS 




leads to a central cluster after 6 hours 35 minutes 


1 MS 


Stopped when 2 main central clusters formed. Stopped after 2.5hours . 




Stopped when 2 main central clusters formed. Stopped after Shours 


1 0.88 


1 cluster formed at edge at 9hrs 5m continued to be stable up to 1 Ihrs 20min, 




1 major cluster formed at edge and approx. 15 single pucks around the periphery, stopped after llhours 





1 major cluster formed at edge and approx. 15 single pucks around the periphery. Stopped after 1 Ihrs 




All pucks taken to periphery after 40minutes but no single cluster formed. Stopped at 1 1 hrs 




All pucks taken to periphery after 3hrl5m but no single cluster formed. Stopped at 3hrs. 



Table 1. The Probabilistic Continuum for Unseeded Clustering 




(a)p^l.O 

(figure 3 continued below) 



(b) p =0.95 





















(g) P =0.5 



(h) p = 0.0 



Fig. 3. Final Configurations for the Probabilistic Continuum 



5. Experiment 2: Modifying the Robot Morphology - Computational 
Equivalence 

In the case of applying a probabilistic element to the deterministic canonical 3 rule 
clustering algorithm a probability of 0.88 puck retention at a wall showed some 
success in forming a cluster against the wall. It was reasoned that such a probability 
of puck retention could also be generated by employing the deterministic rule set with 
the proviso that the efficiency of the obstacle detection sensors would be decreased. 
Since robots execute random turns after collisions and then continue on a straight path 
across the arena, the angle at which a robot collides with the wall may also be 
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considered essentially as random. If the effective acceptance angle of the forward 
sensing infra-red obstacle detectors could be appropriately reduced then it was further 
reasoned that if the robot with a puck approached the wall at an appropriate angle, 
then the wall would be in the sensor 'blind-spot' region. In such a case the robot's 
scoop would be depressed and since no obstacle would be detected Rule 2 would be 
executed causing the puck to be deposited against the wall. A compromise has to be 
reached between 'tendencies' to drop the puck and pick up a puck at the wall. If no 
pucks are dropped (as in the canonical cluster algorithm in Figure 1) then a single 
cluster will form away from the wall. If a puck is dropped whenever a wall is sensed 
then all the pucks will eventually be moved to the wall or against pucks which 
themselves are against the wall. It is therefore expected that when an appropriate 
compromise between the two behaviours is realised then one cluster will eventually 
become prominent and will (with a high probability) become the stable 'master' 
cluster. The experiment described below tests this hypothesis. 

The balance of the behaviours is achieved by reducing the range and angle of 
acceptance of the collision avoidance sensors. In order to reduce the acceptance angle 
of detection the left and right eyes were disconnected and only the central eye was 
used. However, to produce acceptable results the range of the eye was further reduced 
by attaching a single layer of translucent tape over the sensor. Thus in contrast with 
the 'standard' puck clustering set-up a robot carrying a puck and hitting the wall 
cannot see the wall from all angles. This increases the probability of dropping a puck 
at the wall. With only the central sensor functioning the 'normal' retain behaviour is 
restricted to the central region. For a robot colliding with the wall with a trajectory 
which enters either of the two blind regions the puck would be dropped. 

The retain/pickup mean angle for the 10 robots used with a puck in scoop was 
100.3° with a s.d. 7.4°. This compares with the unmodified detection angle of 180°. 
That is, with the three sensor arrangement no matter what the angle the robot collides 
with the wall, which triggers the scoop, the wall is detected. 



5.1 Materials and Methods 

44 pucks were set out in a regular pattern. The overhead camera grabbed a frame 
every 5 minutes. The end condition was considered to be when a configuration of 40 
pucks (over 90%) were within a puck radius of each other. Three trials were 
conducted with 10 robots. 

5.2 Results and Conclusion 

In all three trials the successful end condition of 40 pucks clustered against the wall in 
a single cluster was achieved. The time to completion for each trial were lOhrs 15m, 
13hrs 30m and 14hrs 25m. Figure 4 shows the successful wall clustering after the first 
trial. 

It has been demonstrated that wall clustering can be achieved by altering the robot 
morphology by modifying the characteristics of the obstacle sensors. By employing 
the same deterministic 3 rule algorithm which creates a central cluster and, very 
importantly, reducing the forward obstacle detection capability, wall clustering can 
also be induced. The subtle alteration of the sensors causes the consequences of the 
deterministic rule set to have the same gross behaviour near the wall as the 
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probabilistic mechanism described earlier. That is, a robot with a puck colliding with 
the wall is more likely to drop the puck. This increases the likelihood of other pucks 
being deposited against it which encourages puck cluster development at the wall. 
Such an 'equivalence' is interesting in that it demonstrates that a deterministic rule set 
can generate different outcomes on the macro scale when the 'domain physics' in the 
guise of robot morphology is altered. Thus, it may be possible to create particular 
behavioural repertoires by altering the general robot morphology. Conversely, it is 
interesting to consider that a deterministic rule set operating in different environments 
(for example, robots bulldozing wet sand or dry sand) could generate tangibly 
different outcomes. Arguably, as in this case, this could lead to a more minimalist 
solution. 




Fig. 4. Wall Clustering with the Canonical Deterministic Algorithm 



6 Discussion 

It has been argued here that the results of the experiments indicate that, for certain 
building tasks, control and coordination of a collective of minimalist robots can be 
realised by employing stigmergic mechanisms. It has been shown that changes to the 
environment (as in the case of seeding with different types of seeds) or robot 
morphology (as in the example of restricted acceptance angle for eyes) result in 
markedly different outcomes. Such observations emphasise the tight coupling and 
dependence of the stigmergic mechanism between the robots' action rule set and the 
environment. Such a tight coupling has been exploited by social insects which 
demonstrates that stigmergic 'solutions' can be evolved by natural systems. A 
significant challenge for a designer of a system employing a minimalist action 
reperotire and stigmergic mechanisms is that variations in the domain physics could 
be considered as problematical or potentially advantageous and thus exploitable. 
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Abstract. Is it more efficient to use one or several robots? Will the 
performance of a group of robots working in a collaborative task be en- 
hcinced if the robots can communicate with one another? What learning 
abilities should the robot(s) be provided with for adapting to a continu- 
ously changing environment? We address these three issues in a specific 
task, namely learning the topography of an environment whose features 
change frequently. We propose a learning algorithm based on an asso- 
ciative memory which allows a group of robots to keep an up-to-date 
account of the environmental state when this changes regularly. A prob- 
abilistic model is developed which gives an abstract representation of 
the system. It is used to determine boundcuries for the system’s variables 
(the number of robots, the frequency of environmental changes, and the 
environment’s configuration) within which the learning is successful. The 
predictions of the probabilistic model are confirmed by simulations run 
in Webots, a 3-D simulator of Khepera robots. 



1 Introduction 

Numerous works on autonomous robot systems investigate the questions of 1) 
whether it is more efficient to distribute the area of expertise needed for perform- 
ing a complicated task between several robots rather than designing a unique 
expert robot [7]; 2) whether the use of explicit communication could improve 
the performance of a group of robots in a collaborative task ([2, 12, 14]); 3) what 
learning abilities should the robot(s) be provided with for adapting to a contin- 
uously changing environment [10]. 

We address these three issues in a specific tcisk, namely learning the to- 
pography of an environment whose features, the locations of objects, change 
frequently. A group of worker robots search constantly the environment. The 
robots are provided with an associative memory which allows them to store the 
locations of the objects as they detect them. They transmit to each other the 
coordinates of the objects’ locations by locally broadcasting the location when 
finding an object or when meeting another robot. The information gathered by 
each robot is also transmitted to a static database robot which each robot visits 
regularly. The database robot keeps an up-to-date account of the global state 
of the environment of which each robot has only a partial knowledge. We study 
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the system’s performance in a dynamic environment, in which the locations of 
the objects change with a constant frequency. We investigate the influence of the 
variables of the system, namely the number of worker robots, the frequency of en- 
vironmental changes and the environment’s configuration, on the data collecting 
performance of the group. 

A number of works have investigated multi-robot systems for the mapping of 
a static environment, e.g. [1, 5, 13]. Our work brings new contributions to this re- 
search curea. First, we give an abstract representation of the problem by modelling 
the system as a set of probabilistic equations. In this respect, our work follows 
current line of research which develops probabilistic models of multi-agents sys- 
tems, e.g. for representing biological systems (ants’ society [6]) or engineering 
systems ([8,14]). Second, we validate the model by comparing its predictions 
to results of simulations (this paper) and physical experiments ([4]). Thus, our 
study complements previous work in the domain by providing both a theoretical 
and practical view of the problem. Finally, we study mapping of a dynamic en- 
vironment, that is an environment whose landmarks change location frequently, 
and propose a distributed control architecture which allows a multi-robot system 
to dynamically update its map following the observed changes. 



2 The simulations 

2.1 The experimental set-up 




27«r 




Fig. 1. Arenas of 2 meters (left) and 1 meter (middle) of diameter with respectively 5 and 15 
worker robots. The database robot stands in the centre of each arena. There are 10 and 4 sources 
in the big (left) and small (right) arenas respectively which are represented as patches of 0.1m and 
0.07m diameter lying on the floor. Right Division of the small arena into 80 zones. 



Simulations were carried out in Webots [9], a 3-D simulator of the Khepera 
[11] robots. Simulations use two circulau- arenas of 1 meter and 2 meters diameter 
respectively as shown in figure 1. The Khepera robot is round with a diameter of 
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5.5cm. Thus, we study the exploration strategy in environments which are 1600 
and 400 times the robot’s size. 

The simulator gives a relatively faithful representation of the Khepera robots, 
by incorporating imprecise movements of the robots’ wheels (slipping) and in- 
troducing noise in the robots’ sensors measurements. Each robot is provided 
with 9 infra-red (IR) sensors (used to detect other robots and the arena walls, 
the 9th IR is activated only by the walls and allows to distinguish between 
robots and walls), a detector of ground colour (used to distinguish between 
zones with/ without sources), a radio transceiver, a compass with 5° degrees pre- 
cision and one odometry counter on each wheel. ^ Compass and odometry sensors 
are used by the robots to determine their location relative to the centre of the 
arena. The robots reset their position to the correct one each time they meet 
the database robot or hit a wall in the arena. The odometry errors are therefore 
contained within a range of up to 10 percent error. The sources’ locations, given 
as an angle and a distance relative to the centre, are determined following a 
scaling of the arena into 5 • 16 = 80 (small arena) and 10 * 16 = 160 (big arena) 
zones, see schema of figure 1. Thus, the sources’ locations are known within a 
precision 22.5 degrees (for the angle) and 10 cm (for the distance). 



2.2 The robots’ controllers 

All worker robots have the same controller which is composed of five modules: 
1) an obstacle avoidance module which consists of a one-layer real value feed 
forward neural network with eight input units (one for each infra-red sensor 
measurement) and two output units for the two motors (speed control); 2) 
a memory-based exploration module which determines the robot’s direction of 
travel when crossing the border between two zones of the arena (following the 
division represented in figure 1); each robot keeps track of the number of times 
it has crossed each zone; when it estimates that it has reached the border be- 
tween two zones, the robot turns toward the zone it has less visited so far. 3) 
a communication module which consists of two rules; the robot emits the loca- 
tion of one source (chosen randomly over all locations it knows) when it meets 
another robot (using pt-to-pt protocol, with acknowledgement from the receiver 
robot); the robot broadcasts locally (within a limited range) the location of a 
source when it discovers one; 4) an odometry module which calculates the robot’s 
position relative to the database (centre of the arena) given the measurements 
of the wheels’ counters and of the compass; 5) a learning module which consists 
of a bidirectional associative memory; the robots keeps track of the sources’ lo- 
cations by correlating the two outputs of the odometry module which are the 
angle 6 and the distance p, the polar coordinates of the robot relative to the 
centre of the arena. Each connection of the module between an angle and a 
distance meeisurement is bidirectional and is associated with two parameters, a 
weight Wij — vjji and a time parameter Tij = rji. The associative module takes 
binary inputs (1/0); each input corresponds to a measure of angle and distance 

^ All sensors used in the simulations exist for the real Khepera robots. 
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following the arena’s scaling (see figure 1). The weights w and time parameters 
r are two matrices of 10 by 16 units (for simulations in the big arena) and of 5 
by 16 units (for simulations in the small arena). The experiments starts with all 
weights w and time parameter set to zero. The learning algorithm is a system 
of three rules: 



1. Learning by seeing: 

If the robot detects an object, then 



wg^p = 99 and rg^p = t 

where t is the time measured by the clock of the robot^. 

2. Forgetting: 

If the robot crosses a location given by 0,p such that wg^p >0 but does not detect 
a object, then 

wg^p = 1 and T 9 ,p = t 

3. Leetrning by hearing: 

If the robot hears the location of an object as told by another robot, then: 






> Tp> 



then 



o -K'y +«e' 



I ) Eind Tgi 



B' ,p ,w'gi ,Tg, are the distance, angle, weight and time parameter transmitted 
by the emitter robot. 



The learning for one robot is evaluated by counting the number of correctly 
correlated pairs of coordinates. Learning is successful when this number is equal 
to the number of different locations. A pair of coordinates {9, p} is considered 
as correctly correlated when the weight wg^p is greater than a threshold H. H is 
calculated at each time step as a function of the current value of all the weights 

w: H = 2 — where M{w) = maXu,>o(Tn) and m{w) = mint„>o(w) 

are the maximum and minimum values of weights for all ly > 0. Ma{w) = 
mean,p>o(iy) is the arithmetic mean calculated over all w > 0. 

H estimates the threshold between the important weights (close to 99) which 
correspond to correct correlations and the small weights (close to 1) which are 
noisy or disccirded correlations. The calculation is based on the. hypothesis that 
the distribution of the correct and incorrect correlations are uniform, that is 
equiprobable on all pairs. This is to some extend correct as there is no a-priori 
bias on the number of times the robots can observe each object. Note that there 
is no total forgetting of a source’s location, that is, the weight associated to the 
corresponding pair of coordinates never returns to zero. The minimal value for 
a weight which has been updated once is 1. This value can never be greater 
than the threshold; therefore, a location associated with a weight equal to 1 
is always discarded as no longer valid. Keeping track of all sources’ locations 
which have once been discovered results in the robots constantly checking the 
validity of this location (following rule 2). Forgetting and relearning of locations 



^ The clock is incremented at each processing cycle and is set at zero when the exper- 
iment starts. 
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is thus made faster. The learning algorithm described above is a simplification 
of the DRAMA connectionist architecture [3]. The reader can also refer to [4] 
for further explanations. 

The database robot’s controller consists of the same learning module as the 
worker robot. When worker and database robots meet, the worker robot trans- 
mits to the database robot its matrices of weights and time parameters (all w and 
r, while in a two worker robots communication, only one pair w, r corresponding 
to one location is transmitted). Following rule 3, the database robot calculates 
the mean value between its current set of weights (collected from another robot) 
and those newly transmitted, iff the new information is more recent than its 
current one. The database robot transmits then back to the worker robot the 
mean matrix of weights and time parameters. After a meeting with the database, 
a worker robot has therefore the same global knowledge of the environment as 
the database. This speeds up the forgetting process as the robot can then verify 
more locations (all the locations which have been recorded by the group) than 
only those it had stored itself. 



3 The probabilistic model 

The aim is to define an equation which will allow us to determine T the minimal 
time for the database robot to learn the locations of Ns objects, given that there 
are Nr worker robots, that the arena has size A and that an object covers a 
surface 5j. 

We define the building blocks or fundamental probabilities of the model by 
considering the geometrical configurations of the system. We define the proba- 
bility of meeting the database robot {Pm) as the ratio of the surface of detec- 
tion of the database robot by another robot Sa over the arena’s surface A, i.e. 
Pdb = Sd/A. Similarly, the probabilities of meeting another robot {Pr = Sr/A), 
of passing across a source {Ps = Sg/A) or of being in the range of communication 
of another worker robot {Pc = ScjA) are the ratios of the surfaces of each of these 
objects over the arena’s surface (A = tt • (r^), r = 0.5[m] or r = l[m] for small/big 
arenas), Sd — (smaJl cirena), Sd = ■0.15^[nP] (big arena) Sr = •0.1^[m^], 

Ss = 0.0038[m^] (small Eurena), S, = 0.0078[m^] (big arena), 5c = 

Let Psuccess(A^r, T) be the probability that the event “the database robot 
has recorded Ng locations” has occurred after a time T. This event is true if each 
of the Ng locations have been seen by at least a robot and been transmitted to 
the database robot at least once in a time T, i.e.: 

P.ucc...{Nr,N.,T) = 1 - (1 -Fl — 



■^L-success(-^s’^) probability that a first event “all Ng locations have 
been transmitted” has occurred within a period T — ti, given that a second 
event “all Ng locations have been learned” has happened in a period ti- This 
conditional probability can be expressed as follows: 



^-success ( ~ P{seeDBinT — ti\learnobjectinti) 
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The two events are independent, thus the probability of their co-occurence for 
a given pair {t\,T — ti} is the product of each event’s probability. The total 
probability is the normalised sum over all possible pairs {ti,T — ti} (time is 
discretised) of this product; 



^-success(^« > 



^learn-objectl^*' ’ ^see-database 
^learn-objectl^*’ 



( 1 ) 



The probability of meeting the database robot, -Psee-database(^ “ ^i) equa- 
tion 1, is the probability of crossing the surface Sd within a period T — ti: 
^see-database('T-‘i) = 1 - (1 - . ^’learn-object(^»> *i) probability that 

the event “a robot has learned Ng object locations in a time ti” is true. A robot 
learns about an object’s locations if the robot either sees the object Ps or hears 
its location from another robot p^: P,eam-object(^-.^i) = (l-(^not-learn-object)*M^^ 

^not-learn-object = (1 ~ ^s) • (1 — 



The probability of hearing an object’s location from another robot’s broad- 
cast is the probability that the three following events are true: 1) the listener 
robot is within an area Sc around the emitting robot {Sc is the surface within 
which the communication is audible) and 2) the emitting robot broadcasts the 
particular location, 3) no other robot out of the Ng - 1 (excluding the emitting 
robot, including the listener robot) is simultaneously emitting in that same area 
(Pjnterf probability of this event). It follows: 



^^ear ' ^Interf' ^nterf ~ ^^ear) * ^ * ^ear ~ (^) ' ^^ot-emit)‘ 



Event 2 is true if the emitter robot either sees the object (it then broadcasts 
the location) or if the emitter robot meets another robot which transmits it 
that particular location. In the later case, the emitting robot chooses 1 location 
among the 2* Ng it knows, which comprised the Ng correct and no longer valid 
locations. The later event can occur only if the robot has seen that object within 
a time < ti before meeting another robot. It follows: 



^^ot-emit — (1 ■ (1 ' 



2 . N. 






In the above equations, the unity of surface is the meter and the unity of time 
corresponds to the time needed to cover the surface Sg (which is the minimal 
surface considered in the equations). In order to convert the value of time in 
seconds, T has to be multiplied by where Vr = 0.16[m/s] is the maximal 

speed of the robots and Dr = 0.055[m] is the diameter of the robot. 



4 Results 

4.1 Probabilistic model versus simulations 

A first set of Webots simulations were carried out in a static environment (i.e. the 
locations of the objects did not change). 100 and 150 runs were A run simulated 
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1000 seconds. 10 different runs were done using a different random seed for a 
given number of robots. The number of worker robots was varied from 1 to 10 
and from 1 to 15 in the small and big arenas respectively. We measured the 
mean time delay after which the database robot knew all 4 (small arena) and 
10 (big arena) objects’ locations. In figure 2, we compare the prediction of the 
probabilistic model and the results of the simulations. As one would expect, the 
more robots the faster the learning. However, the relation between these two 
variables is not linear and the increase of time efficiency saturates for important 
numbers of robots. Thus, if one would consider implementing the system in a real 
robotic set-up based on these results, one would determine the optimal number 
of robots by comparing the gain in time efficiency to the cost increase when 
augmenting the number of robots. 









fUil 



10 12 14 1 « 



Fig. 2. The Y-axis represents the mean (over 10 runs) time delay T and the X-axis is the number of 
robots. Each figure compares the prediction of the probabilistic model and of the Webots simulations 
in the small arena (left) and in the big arena (right) (‘*’ point with error bars). 



For both small and big arenas, the results of the probabilistic model and of 
the simulations are qualitatively and quantitatively similar. This means that, 
although the probabilistic model is a crude representation of the system, it 
approximates well the correlations between the main system’s variables. Two 
aspects of the simulations are, however, not represented by the model: 1) the 
probabilistic model assumes a uniform coverage of the space, where all points of 
the space are visited with the same frequency; this does not take into account 
the boundary effects due to the walls which make the centre of the arena (i.e. 
the database robot’s location) a point more often visited than the exterior of 
the arena (this effect is augmented by the exploration strategy which gives the 
preference to riding towards the database robot when crossing the border of the 
16 central zones); in order to represent this effect in the model, we increased Sdb 
compared to its real geometrical value so that we obtained the same probability 
of meeting the database robot (Pd 5 = Sdb I A) as that measured in the Simula- 
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tions. 2) The probabilistic model assumes that learning of a source’ location is 
perfect, i.e. when a robot crosses the source, it learns its location; this does not 
take in account the imprecise determination of the location due to odometry 
error^. 

The probabilistic model is, therefore, a good first approximation of the stud- 
ied system. It allows to determine the optimal efficiency of the system in the 
ideal case; However, more realistic simulations, such as done in Webots, should 
be carried out, in order to evaluate the importance of the above mentioned as- 
pects which not represented by the model. The probabilistic model is general to 
the extend that it makes no requirement on the type of robots and environment 
used. Thus, it could potentially be applied to other experimental set-ups for the 
same task. It is a parameter free model. In order to apply it to another set-up, one 
should simply set the parameters defining the robots (speed of movements, size) 
and the environment (dimension and number of sources) to the experimental 
ones. 



4.2 Learning in a dynamic environment 

3 times 15 runs were carried out in a dynamic environment, in which the object 
changed locations with a constant update rate R varied from 1 to 15. Each run 
lasted 5000 • R seconds, during which all the objects’ locations changed every 
8 • R (small arena) and every 28 • R (big arena) seconds (that is several hundred 
changes per runs). We ran simulations with groups of 1, 3 and 5 robots in the 
small arena and 5, 10 and 15 robots in the big arena. Figures 3 top left and 3 
top right show the mean number of correctly and incorrectly learned locations 
over the whole run for the small and big arenas respectively. The results for 
each three configuration of robots are superimposed. For R less than 5, the 
database knows on average about 50% of the correct locations, while still taking 
for correct almost 50% of the locations which have been updated. As long as R 
is less than 5, the environment changes faster than the minimal time delay T 
required for the robots to learn all the locations. The minimal T was measured in 
the simulations of section 4.1 (see figure 2) as a minimum of 40 and 140 seconds 
for small and big arenas respectively (the measures were consistent with the 
probabilistic predictions). For R bigger than 5, the proportion of correctly learnt 
location increases steadily while the proportion of incorrectly learnt locations 
decrease by the same proportion. There is almost no difference between the three 
different robots’ configurations in each case. Although the minimal time delay 
for complete learning decreases with the number of robots (see figure 2), the 
gain is small (about 10%). In addition, because of the important variance of the 
measured time values (see the error bars in figure 2), the learning performance 
of the database robot appears on average similar for the three different robots’ 
configurations. Thus, there is almost no benefit in using 15 rather than 5 robots 
in the big arena and 5 robots rather than 1 in the small arena. 

^ Thanks to the resetting strategy, the odometry errors are in fact negligible compared 
to the environmentEil discretisation, see section 2.1. 
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Fig. 3. Top; Mean number of correctly and incorrectly learned locations over the whole run for 
small (left) and big arenas (right); superposition of the results for three robots’ configurations (left: 
1,3,5 robots; right: 5,10,15 robots). Bottom State of the database’s knowledge (number of known 
locations) along a run. 5 robots configuration in small arena (left) and 10 robots configuration in 
big arena (right). 



Figures 3 bottom shows the progression of the learning of the database robot 
along a run (results of the simulation in small (left) and big (right) arenas with 
5 and 10 robots respectively for R = 5). The curve varies from zero (no locations 
known) to the maximum (4 and 10 correctly known locations in small and big 
arenas). In the simulation, the objects are not displaced simultaneously. The rate 
at which every object is displaced is constant (it is R) but the moment at which 
the first displacement occurs is different for each object. This explains the fact 
that the learning curve does not always decrease until zero (the new locations 
being discovered and transmitted before all locations have been changed). 

5 Conclusion 



This paper presented a multi-robot system, composed of a group of mobile worker 
robots and one static database robot. A learning algorithm was proposed, com- 
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posed of learning and forgetting processes, which allows a group of robots to keep 
an up-to-date account of the environmental state when this changes regularly. 
The correlations between the variables of the system, the number of robots, the 
frequency of environmental change and the environments’ configuration, were 
modelled by probabilistic equations. Simulations were carried out in a 3-D sim- 
ulator of Khepera robots, which confirmed the predictions of the probabilistic 
model. These experiments demonstrated that the proposed multiple robots sys- 
tem, which is based on an associative memory learning algorithm, is successful 
at learning the topography of an environment which changes with a constant 
frequency. 

AcknOwledgCUients Many thanks to Luca Gambardella and the anonymous reviewers for useful 
comments on these experiments. Lots of thanks to the LAMI, EPFL which provided the facilities 
and technical support for these experiments. This research was supported by a grant of the Swiss 
National Research Foundation, project “A methodology for collective robotics design”. 



References 

1. Amat, J., Ldpe 2 de Mintaras, R., Sierra, C. "Cooperative Autonomous Low-Cost Robots for 
Exploring Unknown Environments”, In O. Khatib and J.K. Salisbury, editors, Proceeding of the 
Fourth Symposium on Experimental Robotics ISER-95, Stanford, US, June 30- July 2, 1995, 
Lecture Notes in Control and Information Sciences, Springer Verlag, pp. 41-49. 

2. T. Balch Sc R. C. Arkin, (1994), ‘Communication in reactive multiagent robotic systems’. Au- 
tonomous robots Journal, 1, pp. 27-52. 

3. A. Blllard Sc G. Hayes, 1999, ’’DRAMA, a connectionist architecture for control and learning 
in autonomous robots”. Adaptive Behaviour Journal, vol. 7:1. 

4. A.Billard, A-J. Ijspeert, A. Martinoli. “A multi-robot system for adaptive exploration of a fast 
changing environment: probabilistic modelling and experimental study”. Submitted to Connec- 
tion Science, special issue on Adaptive Robots. March 99. 

5. W. W. Cohen; (1996), ‘Adaptive mapping and navigation by teams of simple robots’, Robotics 
and autonomous systems, 18, pp. 411-434. 

6. E. Bonabeau, G. Theraulaz Sc J-L Deneubourg, (1998), ‘Fixed response thresholds and the 
regulation of division of labor in insect societies’. Bulletin of Mathematical Biology, 60, pp. 
753-807. 

7. M. J Mataric, (1998), ‘Using Communication to Reduce Locality in Distributed Multi-Agent 
Learning’, Journal of Experimental and Theoretical Artificial Intelligence, special issue on 
Learning in DAI Systems, Gerhard Weiss, ed., 10(3), Jul-Sep, pp. 357-369. 

8. A. Martinoli, A. Ijspeert, F. Mondada, (1999), ‘Understanding collective aggregation mech- 
anisms: from probabilistic modelling to experiments with 

real robots’, Robotics and Autonomous Systems, Elsevier. To appear. Preprint available at 
http : //diwww . epf 1 . ch/laai/teaffl/alcherio/eun_pub .html 

9. O. Michel, (1998), ‘Webots: a Powerful Realistic Mobile Robots Simulator’, Proceeding of the 
Second International Workshop on RoboCup, LNAI, Springer- Verlag. 

10. A. Murciano Sc J. del R. Millan, (1997), ‘Learning signalling behaviour and specialisation in 
Cooperative agents’, Adaptive Behaviour Journal, 5:1, pp. 6-28. 

11. F. Mondada, E. Franzi Sc P. lenne, (1993), ‘Mobile Robot Miniaturisation: a Tool for Investiga- 
tion in Control Algorithms', Proceedings of ISER’93, Kyoto, Japan, October 1993, pp. 501-513. 

12. L. E. Parker, ALLIANCE: An Architecture for Fault Tolerant Multi-Robot Cooperation, IEEE 
Transactions on Robotics and Automation, 14 (2), 1998. 

13. Rekleitis, I. M., Dudek, G., Milios E. E. ”Multi-Robot Exploration of an Unknown Environment, 
Efficiently Reducing the Odometry Error”, Int. Joint Conf. on Artificial Intelligence, August, 
1997, Nagoya, Japan, Morgan Kaufmann, pp. 1340-1345. 

14. E. Yoshida, T. Arai, M. Yamamoto and J. Ota, (1998), ‘Local communication of multiple 
robots: Design of optimal communication area for cooperative teisks’, 7,ourna/ of Robotics Sys- 
tems 15(7), pp. 407-419. 




Task Fulfilment and Temporal Patterns of 
Activity in Artificial Ant Colonies 



Jordi Delgado^’^’® and Ricard V. Sole^’^ 

* Dept. Llenguatges i Sistemes Informatics, Universitat Politecnica de Catalunya, 
Campus Nord, Modul C6, 08034 Barcelona (Spain) 
jdelgadofllsi . upc . es, 

* Complex Systems Research Group, 

Department of Physics, FEN, Universitat Politecnica de Catalunya, 
Campus Nord, Modul B4, 08034 Barcelona (Spain) 
r icardflcomplex . upc . es , 

^ Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM87501, U.S.A. 



Abstract. Since the discovery of self-synchronization of activity in ant 
colonies some authors have argued that this phenomenon Icicks any eidap- 
tive significance, while others have suggested that there may be some 
functional behaviors related to these rhythmical patterns of activity. In 
this paper, we introduce a mobile automata model of task fulfilment in 
ant colonies with self-synchronized activity, and test if these synchronized 
patterns provide any advantage when compared with non-synchronized 
activity patterns. 



1 Introduction 

Oscillatory patterns of activity in ant colonies were discovered independently by 
N. FVanks in Leptothorax acervorum colonies [15] and B.J. Cole in L. allardycei 
colonies [7] at the end of the eighties. Since then, self-synchronization has also 
been found in other species {L. longispinosus, L. ambiguus, L. curvispinosus, 
L. allardycei and L. muscorum [22, chap. 2], also in Pseudomyrmex elonga- 
tus, P. pallidus, Tapinoma Httorale, Zacryptocerus varians and Crematogaster 
ashmeadi [9]), so it seems to be a very general pattern of temporal behavior. 
Furthermore, there is strong experimental evidence that these patterns are a 
collective property, since the individual activation dynamics was found to be 
chaotic by Cole [8j. 

Cole [7] discussed the adaptive significance of these short-term activity cy- 
cles arguing that it is unlikely that these cycles contribute to the efficiency of 
the colony. They are “(...) the inevitable outcome of interactions within social 
groups” [7, pag. 257]. However, further research has been done in order to claxify 
the relation between functional behavior and oscillatory patterns and at least 
two colony activities have been suggested to be enhEuiced by self-synchronized 
behavior: brood tending in L. acervorum [19] and task allocation [23]. It is the 
latter [23] that inspired the present work. G.E. Robinson suggested that self- 
synchronized behavior provides a mechanism for information propagation: 
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“Sampling behavior that involves social interactions may be facilitated 
by synchronous bursts of worker activity, which have been observed in 
ant colonies (...). The decision of which task to perform would be based 
on the integration of acquired information, coupled with behavioral bi- 
ases associated with worker, caste, physiological status and prior expe- 
rience.” [23, pag. 652] 

So, according to [23], self-synchronization facilitates the sampling of any infor- 
mation an individual may need from other individuals. Let us try to clarify this 
point. Assuming that ants cannot be active all the time (which is what is ob- 
served in nature, see above), why self-synchronized behavior would be a better 
(simple) strategy than, say, random (in the sense of “non-synchronized”) activ- 
ity patterns? We will need further assumptions to answer this question: First, 
the obvious one of locality (an individual is able to get only local information) 
and second, the quite reasonable (and biologically plausible [15, 7, 8]) assumption 
that the unique interaction allowed to an inactive individual is to be “awaked” 
by other(s) individual (s); an inactive individual does not carry any information, 
namely, it is equivalent to a “slept” individual. Now, in this context, it would be 
clear why we should obtain an increase of efficiency with synchronized patterns 
of activity: It would maximize the number of simultaneously active neighbours of 
an active individual (let us remind that we are assuming that individuals cannot 
be active all the time). This reasoning, however clear, must be validated with 
both theoretical models and experiments, since only by doing so can we explore 
which assumptions are necessary, and then later go out and see what is the real 
state of things in real ants. The study of a theoretical model is the main point 
of this paper. 

Designers of artificial societies should be also interested in this subject- 
matter, since if self-synchronization is able to provide a better functioning of 
collective systems, it would be highly desirable to implement this mechanism in 
either collective algorithms or collective robotics [3, chaps. 5-7]. A further ad- 
vantage is that it is quite easy to implement using only simple local interactions, 
as the work with Fluid Neural Networks suggests [25,26] (see below). 

In section 2 we introduce our model and our first results are detailed in 
section 3. In section 4 we discuss our results and suggest a simple experimental 
setting to test them. 

2 The Model: Fluid Neural Networks and Finite 
Threshold Models 

In order to start our work, we need to couple two different phenomena: Self- 
synchronized activity and information spreading. There are several mathemati- 
cal models to deal with the former [2, 25, 26, 28, 18], among which, for simplicity, 
we have chosen the Fluid Neural Network (FNN) [10-12,22,24-26]. Further- 
more, the FNN is the model best suited to our purposes, since it cillows one to 
introduce additional behaviors upon the basic activation behavior of individual 
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automata. The latter, information spreading, is a relatively loose notion that we 
will make more specific through the notion of task spreading. Broadly, a mecha- 
nism based on the Finite Threshold Model of division of labour [4] is built upon 
each individual in a FNN. Besides, individuals will be able to communicate a 
certain fraction of task to their neighbours, spreading it through the system (see 
details in section 2.3). It is similar to the spread of a large protein source within 
the nest or to the spread of liquid food via trophalaxis [21]. 

2.1 Fluid Neural Networks 

In FNN the standard approach of neural networks is used [1, 17], but a new set 
of rules defining local movement and individual activation are also introduced. 
A set of N automata or “neuron-ants” is used. The state of each automaton (say 
the i-th one) is described through a continuous state variable Si{t) £ R, at each 
time step t e N. Each element can move on&LxL two-dimensional lattice with 
periodic boundary conditions. 

If Sj{i) is a given automaton (the spatial dependence is omitted for simplic- 
ity), the new states are updated following: 

Si{t + 1) = tanh \ghi{t)] (1) 

where 3 is a gain parameter and hi{t) can be defined in diverse ways in order to 
get the desired behaviour. We will obtain oscillations in activity (see below) if 
the term h,- (t) includes interactions with the eight nearest neighbours: 

hi{t)^Siit)+ ( 2 ) 

where B{i) are the nearest automata. To get non-synchronized individuals we 
do not need interaction with any other automata: 

hiit) = Si{t) (3) 

We have seen above that one of the properties observed in isolated ants 
was spontaneous 2ictivation [8]. In FNNs this has been included in the following 
way: each automaton can be either active or inactive and, if active, it moves 
randomly to one of the eight nearest cells (if no space is available, no movement 
takes place). In our model a given automaton will be active if Si{t) > 6 act and 
inactive otherwise. Once an automaton becomes inactive, it can return to the 
active state (with an spontaneous activity level Sa) with some probability Pa- 

The collective behaviour we measure in FNNs is the mean activity of the 
system. We define an activity for each individual Si{t), a\=0 [S'j(t) — 6 act], so 
the mean activity at time t will be 

J=l J=l 



( 4 ) 
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Fig. 1. Evolution in time of pf. We get bursts of synchronized behaviour with Sa = 
0.01, Pa = 0.001 and non-synchronized behaviour with So = 0.1, pa = 0.03 (circles). 
In both cases the average activity per individual and time step is ~ 0.3, that is, an 
individual is active approximately the 30% of the time. 



where € [0, 1] and 0 [i] is such that 0 [x] = 1 if a: > 0 and 0 [x] = 0 
otherwise. We define also the total density of automata as p = N/L^. 

In this paper we have chosen the FNN parameters N, L,g, Bacu Sa and Pa to 
get the desired behavior, that is, self-synchronized activity and non-synchronized 
activity (see figure 1), though in both cases we have imposed the following con- 
straints^: 



- A number of individuals similar to that observed in colonies with synchro- 
nized activity [22], 

- The activity level per individual must be around ~ 0.3, that is, each indi- 
vidual is active approximately the 30% of the time, on average, as observed 
in species with synchronized activity [20,6,14]. 

- The density p of the system should be around ~ 0.2, as was observed exper- 
imentally in [16], see also [24]. 

The sort of interactions defined in FNNs (eictivation among individuals) are 
currently being the subject-matter of experimental research by B.J. Cole and 
collaborators [9]. An analysis of FNNs is performed in [10-12]. 



* To be more specific, the parameters we use are: N = 120, L = 25, p = 0.1 and 
Oact = 10“^®. We get synchronized behavior with Sa = 0.01, pa = 0.001 and hi 
defined as in eq. 2, and reindom behavior with Sa = 0.1, pa = 0.03 and hi following 
eq. 3. See [24, 11] for a study of the FNN parameter space 
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2.2 Finite Threshold Models 

Task allocation in ant societies is one of the more astonishing aspects in the 
life of a colony. How do individuals “know” what to do all the time? Genetic 
determination would be an explanation, at least in polymorphic species, but 
only a 17% of the known living ant genera contain species with some degree of 
polymorphism [21], so most of ant species are monomorphic. It is obvious that 
“cognitive” capabilities of individuals are not sophisticated enough to decide 
what task to do next in order to fulfil global colony needs. Thus, this question 
remains a matter of controversy, though there are some hypothesis at hand 
(see [23,27] for reviews of division of labour in insect societies). One of them, 
the Finite Threshold Model (FTM) [4] is the one we have chosen to model task 
fulfilment, since it is simple enough to couple with FNNs. 

The basic assumptions are that some specific stimulus is associated with each 
task and that each individual has fixed response thresholds to the various stimuli, 
so that the lower the threshold the more likely the individual will engage in the 
task, given exposure. There is an experimental basis that justifies this approach, 
for example it has been proved the existence of response thresholds in honey 
bees (see [23], and [4] for a more detailed discussion on the experimental basis 
of the FTM). 

Here we will work only with one “abstract”, spatially distributed task. As- 
sume that an active, though not working, individual Si perceives, in some way 
to be specified below, a quantity s of stimulus. It will engage in the task with 
probability [4] 

p ^5-Not working ^Working) ^ - A - (5) 

where 6 is the individual threshold associated to the task. In fact, we will use 
the stimulus also as a representation of the “amount of task” to be done, so that 
once an individual is engaged in a determined task, it has some probability p(s) 
per unit time of completing the task. Throughout this paper we will use the 
nonlinear response function 



p(s) = 



1 

(TTi^y 



so that the greater the task, the less likely the individual will complete the task. 



2.3 Coupling FNNs and FTMs 

Let us assume aLxL lattice where N individuals are spread out. Each individual 
will be characterized by a triple {Si{t),Xi{t),6i) where Si{t) is the FNN-state of 
the individual i, Xi{t) is a two-valued variable signaling whether the individual 
is working {Xi{t) = 1) or not {Xi{t) = 0) and is the FTM threshold. Also, a 
working (X{(t) = 1) and active (Si(t) > 9act) individual may be doing a certain 
amount of task Ci(t). Each lattice position will be either void or will contain one 
individual. Besides, it may also contain a certain amount of stimulus. 
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Initially our system will be composed of N non-working individuals, with 
a random initial FNN-state. di will be initially distributed uniformly between 
Omin and 9max among individuals. A randomly chosen position of the lattice (the 
“task origin”), will contain a certain amount of total task Ci„ to be performed 
by the individuals, while lattice coordinates other than the task origin will be 
initially empty of stimulus. 

Si{t) will evolve in time exactly as a FNN (section 2.1), so what remains to 
be defined is the evolution in time of Xi{t) and eventually Cj(t), that is, the task 
fulfilment process. This has the form of some rules more or less plausible from a 
biological point of view (see the initial paragraph of section 2): 

1. Effective realization of the task An individual i at time t may be active 
(Si(t) > 6 act) and working (Xi(t) = 1), with a certain quantity of task Ci(t) to 
be done. At time t + 1, this individual may accomplish the task with probability 
p{ci(t)), in which case the total task remaining in the system will be decreased 
by an amount Ci{t). If the (active and working) individual i does not get the task 
done at time < -I- 1, it may become inactive (Si(t -I- 1) < 9act), in which case the 
amount of task Ci{t) will be stored in the lattice coordinates of the individual 
i. Inactive individuals do nothing, though they may be activated by the FNN 
dynamics of the system. Next rule deals with the remaining case, that of active 
and non-working individuals. 

2. Propagation of the stimulus: An active and non- working individual may be 

stimulated by all its active and working nearest neighbours and by the amount of 
task stored at its lattice position. Each active and working individual, say the j- 
th, will be able to provide a quantity of stimulus that will depend on the number 
of active and non- working neighbours, say nj. A quantity crj = 0Cj{t)/nj will 
be the stimulus provided by j to each of its nj non-working active neighbours. 
Thus, an active and non- working individual, say the fc-th, will receive a quantity 
of stimulus Ck = Sj(fc) where i{k) ranges over the aictive and working 

neighbours of the A;-th individual and is the amount of task at the lattice 
coordinates x, y of k. If this individual becomes a worker, with a probability given 
by equation 5, the quantities aj are substracted firom Cj(f) (for all i active and 
working neighbour of k) and Ex,y will become (l—a)Ex,y The parameters a and 
13 allow one to control the degree of stimuli received from the task stored in the 
lattice and the spreading of stimuli from individuals to individuals, respectively. 



3 Comparing Self- synchronized Activation with 
Non- synchronized Activation 

The model detailed in section 2 has been used to test whether a system with 
self-synchronized activity patterns is able to perform an “abstract” task better 
than a system with random activity patterns, both subject to the constraints 
mentioned in section 2.1. What does “better” mean in this context? It will be 
equivalent to “faster” . Thus, we want to measure how fast a system with certain 
patterns of activation is able to perform a certain fraction of the initial task Ci„. 
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Fig. 2. Single run simulations to compare C^^(t) and (A) (triangles) 

and (circles) with Ci„ = 10000. (B) C^^(t) (trieingles) and (circles) 

with Cin = 50000. In both cases a = 0.25 and ^ = 0.85. 



We will measure how the total task that remains to be done in the system 




N 



E r,.„(t)+Eci(f)Xi(t) 



KXtV^l 



i=l 



( 6 ) 



evolves in time with two different patterns of activity: self-synchronized (SS) and 
non-synchronized (NS). This has been done for a system where each individual 
has a 6i chosen at random with uniform distribution in the interval (1,10), 
reflecting a plausible lack of threshold uniformity in real ant colonies [4]. Figure 2 
shows that there is no difference in behaviour with a small Cin, though a larger 
initial task makes the SS behaviour much more efficient than NS, that is, the 
SS system gets the task done in less time than the NS system. More systematic 
calculations have been done computing the difference between and tgg, 
where tgg is the first time step such that the system with self-synchronized 
behaviour verifies C{tgg) < 0.1 (some lower bound is needed, since the amount 
of task remaining decreases asymptotically to zero). We define ^NS in the same 
way. As we can see in figure 3 the difference 



E(Cin) — t*NS ~ ^ss 

averaged over M measures, grows with Cm, making clear that the more “quan- 
tity” of task needs to be done, the more efficient is the SS behaviour. 

Robinson (see section 1 and [23]) gave us some clues to ascertain the causes of 
the superior efficiency of SS behaviour. The key idea is that the spreading of the 
task is enhanced by the greater number of active neighbors one individual has, 
on average, in the SS case. This phenomenon allows the individuals to distribute 
faster the task, “breaking” it up into smaller pieces, so that it is much more 
likely that an individual completes its task. 
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Fig. 3. E{Cin) (see text) averaged over M = 25 samples for each Cin, with parameters 
a = 0,25 ajid = 0.85. 



4 Discussion 

In this paper we have introduced a framework with which to study the relation 
between patterns of activity in social insects and the ability to fulfil some task. 
We have seen that self-synchronization enhances the efficiency with which the 
system performs some sort of “abstract” task. Our first numerical results with 
the coupling between the FNN and the FTM point out that observations in real 
colonies, concerning the functional side of synchronized patterns, may reflect a 
more genercil relation between task fulfilment and self-synchronization, since, as 
Robinson [23] tells us in the quote above mentioned, to ascertain the mechanisms 
of the spreading out of tasks and/or information inside the colony is important 
to ciccount for the task allocation abilities of social insects. 

The phenomenon is not difficult to understand. The key point is the idea 
that “if it is not active, it does not work” together with the local transmission 
of information (stimulus, task, etc...) fi-om individuals to individuals. If not ac- 
tive permanently, the only way to ensure that an active individual will have as 
many active neighbours as possible that can be stimulated, and consequently to 
get the task as scattered as possible, is having synchronized axrtivity. Besides, 
individuals are more likely to start tasks immediately on being activated, since 
the activating neighbours may be work-carrying. We are currently working on 
the precise relationship between activity patterns and task spreading, using the 
model introduced in this paper to study the infiuence of in the efficiency of 
task fulfilment (our first results were recently introduced in [13]). 

Related work was introduced in [5] where it was suggested that synchroniza- 
tion may enhance foraging efficiency. Finally, some experiments [16] and previ- 
ous work with FNNs [24] clearly suggest a dependence of self-synchronization 
on colony density. Thus, it would be possible to do experimental work with 
real colonies by modifying artificially their density (as in [16]) inducing non- 
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synchronized behavior. This setting would allow us to compare task fulfiling in 
colonies with different patterns of temporal activity. 
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Abstract. In this paper we place ant algorithms in a reinforcement 
learning framework. We concentrate on the original Ant System and 
we briefly discuss Ant Colony system. We show that ant-quantity and 
ant-density can be considered as TD(0) algorithms which only take into 
account immediate reinforcement. Whereas ant cycle is basically an on- 
policy Monte Carlo method. We introduce the notion of decay traces, for 
modeling the decay of trail. 



1 Introduction : Ants and Reinforcement Learning 

In this paper we try to formulate the Ant System (AS) algorithms as Reinforce- 
ment Learning algorithms. AS was the first system in the Ant Colony Opti- 
mization area. This field studies artificial systems that are inspired by real ant 
colonies behavior. The main observation is that real ants are capable of finding 
shortest paths thanks to a simple pheromone trail-laying mechanism. A good 
overview of the state-of-the-art in the field and the definition of a meta-heuristic 
is given in [2]. In the next section we begin by introducing AS. We then analyze 
the three algorithms of AS in section 3 and finally we make a comparison of 
the experimental results of AS with our theoretical findings in the conclusion- 
s. Details on the Reinforcement Learning theory can be found in [5]. A more 
elaborate version of this work is reported in [4]. 

2 Ant System 

The three algorithms of Ant System for solving the TSP and ATSP, presented in 
[1] are ant-density, ant-quantity and ant-cycle. At a time instance t, every town 
i has a number of ants, who chooses the next town to go with a probability that 
is a function of the town distance and of the amount pheromone trail present on 
the connecting edges. Transitions to already visited towns are disallowed (forced 
by a list of visited towns which is in the memory of the ant). The algorithms 
differ in the way how they update the pheromone trail of the ants. Let Tij be 
the intensity of the trail on edge {i,j). The trail is updated as follows: 

Tij {t + h)<- pTij {t) + ATij {t, t + h) (1) 
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where p is a trail decay coefficient such that 0 < p < 1 and : 

Anj {t,t + h) = At^j {t, t + h) 

At^j (t, t + h) is the quantity per unit of length of trail substance laid on edge 
{i,j) by the k-th ant between time t and t + h. For ant-density and ant-quantity, 
h = 1, for ant-cycle h = n. The total number of ants in the system is m, the 
transition probability is: 



Pij{t) = 



0 



if j 6 allowed 
otherwise 



( 2 ) 



r]ij = 1 /dij is called the visibility. For the ant-density model we have : 



AT^j{t,t+l) 



J Qi if fc-th ant goes from i to j between t and f -|- 1 
1 0 otherwise 



( 3 ) 



while for the ant-quantity: 



AT^j{t,t+l) = 



^ if /e-th ant goes from i to j between t and t + 1 
0 otherwise 



( 4 ) 



and for ant-cycle : 



Z\r*. (f, t -h n) = 



^ if A;-th ant used edge {i,j) in its tour 
0 otherwise 



( 5 ) 



L* is the length of the tour done by the fc-th ant. Qi, Q 2 and Q 3 are constants. 



2.1 Formalization of Ant-Quantity and Ant-Density 

TD(0) View We can view ant-density and ant-quantity (see (1) and (3) or (4)) 
as a TD(0) algorithm, with discount factor 7 = 0. Let the different towns in the 
TS problem stand for the states of the environment in which the reinforcement 
learner, which is here every individual ant, tries to learn. The possible actions 
an ant can take in a certain state are the possible edges it can follow to end 
up in the next town/state. The value function one tries to learn, is the trail of 
pheromone every edge receives. This value belongs to a town i and edge leading 
to a different town j. So the values Tij are state-action values. In terminology of 
reinforcement learning: Q{s,a) = Q{i,ij) = nj. As this value only depends on 
local information, see (3) and (4) and no long term reward is taken into account, 
we conclude that : 7 = 0. As a result, learning only depends on the immediate 
reinforcement the learner receives for taken action ij in state i. In this view, (1) 
{h = 1) closely resembles the update rule of a TD(0) algorithm : 



Qt+i{i,ij) + a{rt+i +jQt{j,jk) - Qt{i,ij)) 
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Where rt+i is the immediate reinforcement, a the step size parameter and jk 
the next action considered. When 7 = 0 this becomes: 

Qt+i{i,ij) ^ Qt{i,ij) + ot[rt+i - = (1 - a)Qt{i,ij) + an+i (6) 

Comparing this to update rule (1) shows that step parameter a can be considered 
as 1— p, and immediate reinforcement rt+i really is As a step size parameter 
is usually taken very small, this equalities seems very natural. The fact that the 
ants action-selection procedure, given in rule (2) is not genuine e-greedy ^ (cities 
already visited are explicitly excluded) should not be a problem, because of the 
combination of several ants walking around, one starting in every town. This 
is comparable with the exploring start condition in [5]. The difference with the 
TD(0) algorithm is that state-action pairs in (1) are also updated when this 
action is not selected. For every time step where there was no ant crossing link 
ij, the value Tij is updated according to: Tij{t -I- 1) ^ pTij{t). What this really 
means is that an action is punished for not being taken. One could simply see 
this punishment as coming from the environment. 

We also found this behavior in the theory of Learning Automata, [3]. A 
Learning Automata is really a vector of action probabilities, which can be up- 
dated with various learning schemes. For instance in the reward-penalty scheme 
actions are rewarded when they proved to be good or others proved to be bad. 
Conversely, actions are punished when they proved to be bad or others proved to 
be good. Being good in the ant-density and ant-quantity environment can then 
be interpreted as being visited. However the precise update scheme here seems 
not to be linear. Since little is known of non-linear update schemes, this should 
be further analyzed. 



Decay eligibility trace Another place where we find this kind of updating of 
state-action values, without being selected, is in the TD(A) algorithms, where 
a trace of eligibility marks states or state-action pairs as being eligible for un- 
dergoing learning changes not only by the immediate reinforcement but also by 
reinforcements received by actions taken in the future, see [5]. However the def- 
inition of the trace here would reduces to 1 in the state-action pair currently 
visited, and to 0 elsewhere, because state-action values only depend on the im- 
mediate reinforcement. So we have to introduce another kind of trace for the 
decay of r^ , which we will name the decay trace. Let us write such a trace 
starting from rule (6) : 

AQt,t+i{i,ij) < aQt{i,ij) + art+i = - a{l - X')S^Q{X'yQt{i, ij) + an+i 

- - a(l - X')Qt{i,ij) - a(l - X')E^j^{X'YQt{i,ij) + an+i 



This can be separated in two update rules: 



AQt,t+i(L ij) t— 



— q(1 — X')Qt{i,ij)et{i,ij) + art+i ij is visited 
—a{l — X')Qt{i,ij)et{i,ij) otherwise 



(7) 



^ In an e-greedy policy, most of the time an action which has maximum estimated 
value is chosen, but with a non zero probability e a random action is chosen. 
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where et(i,ij) is taken as: 




X'et-i{i,ij) if ^ 

1 if (i,ij) = {it, ijt) 



What happened is that the computation of Qt{i,ij) was spread over subsequent 
time steps just as in equation (1). Comparing rule (1) {h = 1) with (7), we see 
that rt_|_i = Note that we also derived this equality in our interpretation 
of TD(0). Furthermore (1 — p) can be interpreted as a(l — A'). This is our earlier 
learning factor multiplied with a factor (1 — A'). 



2.2 Modeling the Ant Cycle Algorithm 

Monte Carlo View For the ant-cycle algorithm, the obvious choice was to try 
to characterize it as a Monte Carlo method, see [5], The general update rule for 
an every visit MC method is given by: 

Qt+n{s,a) t— Qt + a{Rt — Qt{s,a)) (8) 

Q is a constant step-time parameter and Rt is the actual return of the episode: 

Rt = rt+i + in+2 + I'^rt+z -I- . . . + (9) 

T is the last time step of the episode. Comparing (8) with (1) and (5) {h = n) 
leads to the following equalities: a = (1 -p) and Rt = We make a similar 

assumption as in the previous subsection that state-action pairs who were never 
visited during the current episode are punished by the environment by decaying 
their value by p for the next episode. We can write out Rt as in (9) : 

p ^ ym' Q 3 ^ ym' QsL ^ yn f ym' Qsdt 

{1-py^Lk ~ “ (i-p) 

where m' is the total number of ants who passes link ij in its tour in the current 
episode, and dj is the distance of the edge the A;-th ant crosses at time step t. 

Therefore: rt+h = (^^1 7 = meaning that we are 

considering the reinforcement of the different actions at a later time moment 
with an equal weight, (there is no preference in the sequence of the towns) This 
update is actually an off-line TD(A) method where A = 1, see [5]. Furthermore 
the update is on-policy^ because the tour length is used in rule (5). 



Decay eligibility trace To handle the update of non-visited state-action pairs, 
we use the idea of eligibility traces again. Since we are considering a TD(A) 

^ On-policy methods attempt to evaluate or improve the value of a policy while using it 
for control. In off-policy methods the policy used to generate behavior is not related 
to the one that is being evaluated or updated. 
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method with A = 1 and 7=1 here, we could define the classical eligibility trace. 
However, since updating can only be done off-line it would not be interesting 
here, moreover it is not the reinforcement (as is the case in classical TD(A)) but 
it is the decay of Tjj which is spread over subsequent periods. As in (7) we have 
two update rules, but now the decay is over episodes: 






-a(l - + Q-Rt ij was visited 

- 0!(1 - otherwise 







X'et{i,ij) 

1 



ij was visited 
otherwise 



3 Conclusions 

In this paper we come to the conclusion that ant-density and ant-quantity can be 
considered as TD(0) algorithms without longterm reward and with a special case 
of eligibility trace. Therefore finding a global optimal solution requires a good 
heuristic to guide the exploration, and to result in a better performance than a 
random search. Since the ants are exploring the search space starting in different 
towns, the chance to get stuck in a local minimum reduces. This is confirmed 
by the experiments in [1] for testing the parameter settings. Best results were 
achieved by setting the value of P high opposed to the value of v, thus giving more 
weight to the heuristic, opposed to the auto-catalytic process. On the other hand 
ant-cycle is more robust, because it is basically a MC method, which is well-suited 
for optimizing multi-stage decision problems. In particular it is an on-policy MC 
method, using a soft action selection mechanism. Again this is visualized by the 
experiments run by the authors [1] : ant-cycle outperforms the other two. The 
first two algorithms reduce to a non-interesting case of reinforcement learning, 
whereas ant-cycle can be classified as being a better learner. 
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Abstract. The self organizing properties of ant colonies are employed 
to tackle the classical combinatorial optimization problem of graph parti- 
tioning. The graph is mapped onto an artificial environment in a manner 
that preserves the structural information. Ants from a number of colonies 
compete for resources. This leads to a restructuring of the global envi- 
ronment corresponding to a good partition. On the example graphs, this 
is shown to outperform the current best algorithms which are based on 
recursive bisection techniques. 



1 Introduction 

Simulation of fluid flow with the Finite Element Method approximates solution 
values across a domain using a set of discretized elements. In 2-Dimensional 
problems elements are planar polygons. Partitioning the domain for parallel 
processing can be mapped onto a graph partitioning problem where each vertex 
in the graph represents a node in the mesh, whilst an edge in the graph represents 
the need for communication between two nodes. The graph must be broken into 
approximately equal size sub-domains with as little communication between 
domains as possible. 

There are graph partitioning problems which are NP-complete [5] and there- 
fore we are looking for a near optimal partition in reasonable time. Most parti- 
tioning methods employ recursive bisection which can often provide a partition 
which is far from optimal [14] as regards minimizing the number of edge cuts. 
What seems optimal at the top level of recursion may provide a poor partition 
at lower levels given the benefit of hindsight. Recursive Spectral Bisection has 
been shown to be highly effective compared to alternative methods [13]. Multi- 
level implementations [8] are yet more effective especially in combination with 
local refinement using Kernighan Lin [9]. Recently many methods have been 
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generalized to partition a graph into more than two sets at each stage of recur- 
sion [7], It is thought that such methods could potentially produce better par- 
titions. However, the direct computation of a good k-way partitioning is harder 
than a bisection and hence recursive bisection is most commonly used. Dorigo 
et al. have applied the trail laying properties of ant colonies to the Travelling 
Salesman Problem, Quadratic Assignment Problem and Routing in telecommu- 
nication networks showing high quality results [3], [4], [1]. Kuntz et al. tackled 
the graph partitioning problem using a clustering algorithm [10] based on brood 
sorting in ant colonies [2] and a swarm colonization technique [11]. Such methods 
have shown good results on small graphs. 

2 The Approach 

Initially, we consider the bisection case in which two competing colonies of ants 
are used to split the graph into two partitions. Each colony is centred around 
a fixed cell in a grid which represents the environment in which the ants can 
navigate. The ants must learn to forage for food, each piece of food on the grid 
represents a node in the mesh which is being partitioned. The ants must find all 
the food and place it in the appropriate nest so that the set of nodes represented 
by the food in Nesti forms a set Vi and the set of nodes in Nest^ forms a set 
V 2 - The graph bisection problem for a graph G = [V,E) with vertices V and 
edges E, seeks a partition P = V] U V2 such that p n P2 = 0 > |Pil ^ IP2I and 
the number of cut edges \Ed is minimized, where: 

Ec = {(vi,n 2 ) € E\vi e Vi,V 2 e P 2 }- 

In order to cut down the search space we map the nodes onto the grid in a 
manner that represents the structure of the mesh by mapping the geometrical 
position of nodes in the mesh onto the equivalent geometrical position on the 
grid. To encourage the colonies to find a reasonable initial partition we place 
each nest at the center of an equal number of nodes using a recursive bisection 
heuristic. To further cut down search and utilize the structural information on 
the grid we prevent any colony from exceeding the desired number of nodes by 
more than one and prevent ants from taking food from other colonies until their 
colony has acquired 90 percent of the required number of nodes. The main body 
of the loop, as shown below, is executed for each ant in each colony during every 
time step. The grid is initialized by placing the food and locating the nest for 
each colony. The body is iterated until there is no reduction in the number of 
cuts over 500 time steps. The best partition found during this time is taken to 
be the solution. 

if (carrying-food) then 

if (in_nest_locus) then (drop_food) 
else (move_to_nest) 
if (food-here) then (pick_up_f ood) 
else 



else 
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if (f ood_ahead) then (move_f orward) 

else if (in_nest_locus) then (move_to_away -pheromone) 
else if (help-signal) then (move_to_help) 

else (move_to_strong-forward_pheromone) 

The actions and functions in this code are detailed below in Table 1. It should 
be noted that this foraging strategy can be generated using Genetic Program- 
ming [12]. 



Table 1. Actions and Functions for Main Algorithm 



Function/ Action 


Explanation 


if-f ood_ahead 


Return True if food is in the adjacent square. 


if _f ood-here 


Return True if food is in the current square. 


if-in_nest_locus 


Return True if food is within two grid squares 
of the colony nest. 


if -Carry ing-food 


Return True if ant is carrying food. 


if_help-Signal 


Return True if there is a help signal within a 
locus of two grid squares. 


move-tO-nest 


Move one step towards the nest in direction: 
if {\Nest.x — x| > \Nest.y — y\) 
then if (x > Nest.x) then dir = WEST 
else dir = EAST 

else if (y > Nest.y) then dir = SOUTH 
else dir = NORTH. 


movs-to-help 


Move one step towards the nearest help signal. 


move-to_away -pheromone 


Move one step away from the nest in the away 
direction with the strongest pheromone trail. 


move_to-Strong. 
f orwar d-pher omone 


Move to an adjacent square with probability 
dependent on the amount of pheromone present. 


pick-up-food ( foraging ) 


Food not belonging to a colony is picked up if 
proportion of cuts created is less than half 
total possible number. Number of ants needed is 
related to the proportion of cuts. Help signal 
sent out if not enough ants. 


pick_up_food ( raiding ) 


Food already in a colony is picked up with 
probability p dependent on the change in the 
proportion of cuts Spc where 
1.0 if Spc > 0 

0.5 if Spc = 0 

^ ~ 1 0.0 if dpc < -0.33 

l/(w * (5pc)^) if -0.33 <5pc<0 
where u; depends on 5pc- 


drop-f ood 


Place food around colony nest using a clockwise 
search for an empty ceU. 
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3 Results 

The Ant Foraging Strategy (AFS) is compared with Recursive Spectral Bisection 
(RSB), Recursive Spectral Bisection plus Kernighan Lin (RSB +KL) and also 
Multilevel Kernighan Lin (ML-KL). Each method is tested on two 2-dimensional 
meshes over two, four and eight partitions. Recursive bisection (rb) and simulta- 
neous multiple partitioning (k-way) methods are implemented in the Chaco 2.0 
package [6]. The number of cuts for the partitions produced on two finite element 
meshes are given in Table 2. Mesh 1, (114 nodes, 308 edges), is a triangular mesh 
with uniform node density. Food was placed on an 11 by 11 grid. Mesh 2 (a test 
case from the Chaco package), (286 nodes, 1046 edges), is a quadrilateral mesh 
with variable density. The food was placed on a 25 by 25 grid. The total number 
of ants on the grid is always equal to 45 percent of the total grid area. ML-KL 
is coarsened to 30 nodes for Mesh 1 and 50 nodes for Mesh 2. 



Table 2. Number of Cuts Created by Various Partitioning Methods 



Mesh 


Method 




ItMtl 


KL-fRSB 


KL-fRSB 


ML-KL 


ML-KL 








rb 


k-way 


rb 


k-way 


rb 


k-way 


k-way 


1 


Bisection 


mm 


25 


25 


25 


25 


28 


25 




Quadtisection 


mm 


51 


51 


50 


52 


55 


49 




Octrisection 


mm 


103 


87 


97 


84 


95 


81 


Time (secs) 


Octrisection 


liBil 


0.2 


0.16 


0.31 


0.18 


0.53 


Msm 


2 


Bisection 




29 


28 


28 


28 


28 


28 




Quadrisection 


mm 


91 


90 


90 


88 


90 


87 




Octrisection 


■FZSM 


195 


173 


191 


164 


187 


162 


Time (sec^s) 


Octrisection 




0.29 


0.27 


0,46 


0.29 


0.49 


8.73 



4 Discussion and Further Work 

The Ant Foraging Strategy shows some good results on the test meshes. It per- 
forms better than RSB, RSB-fKL and ML-KL when implemented for quadrisec- 
tion and octrisection. It also performs better than RSB, RSB-fKL and ML-KL 
methods when implemented for recursive bisection because it is not dependent 
on partitions at higher levels of recursion. AFS is relatively efficient compared 
to other swarm methods as it maps the the structure onto the environment 
and places nests to provide a rough initial partition quickly using a small pop- 
ulation. It is also reasonably comparable to multilevel octrisection methods for 
uniform density meshes. However variable density meshes need a relatively larger 
grid to represent sufficient structural information. Hence a method for mapping 
food onto the grid which represents structural rather geometrical information is 
needed to make AFS more efficient for arbitrary meshes. 
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5 Conclusion 

Most approaches to the k-way partitioning problem use recursive bisection and 
local improvement techniques, except spectral quadrisection and octrisection 
which perform relatively poorly. AFS provides a novel global method for tackling 
the problem with built in local improvement which can simultaneously partition 
into as many sets as required. Results show that swarm-based simultaneous mul- 
tiple partitioning needs further investigation as it can produce better partitions 
than recursive bisection techniques which is dependent on the partition found at 
higher levels of recursion. Furthermore, AFS is relatively efficient compared to 
other swarm-based methods and scales up well to large meshes [12] when multi- 
level methods are applied and food is placed on the grid using a BFS technique 
to provide more structural information. 
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Abstract. We present in this paper a new hybrid algorithm for data 
clustering. This algorithm discovers automatically clusters in numerical 
data without prior knowledge of a possible number of classes, without 
any initial partition, and without complex parameter settings. It uses 
the stochastic and exploratory principles of an ant colony with the de- 
terministic and heuristic principles of the K-means cJgorithm. Ants move 
on a 2D board and may losui or drop objects. Dropping an object on an 
existing heap of objects depends on the similarity between this object 
and the heap. The K-means 6Jgorithm improves the convergence of the 
ant colony clustering. We repeat two stochastic/deterministic steps and 
introduce hierarchical clustering on heaps of objects and not just objects. 
We also use other refinements such as an heterogeneous population of 
ants to avoid complex parameters settings, and a local memory in each 
ant. We have applied this algorithm on standard databases and we get 
very good results compared to the K-means and ISODATA ^llgorithms. 



1 Introduction 

Clustering is one of the problems to which artificial ^^nts have been applied with 
success in their early developments, Deneubourg and his colleagues are probably 
the pionneers in this domain with their work on robotics (Deneubourg et al. 1990, 
Goss and Deneubourg 1991) which has been used as a basis for more recent work 
involving artificial ants. For instance, clustering ants have been used in a real 
world application in the context of VLSI technology (Kunz and Snyers 1994, 
Kuntz et aJ. 1997) . The problem here is to find a relevant partitioning of a graph. 
For this purpose, this partionning problem is turned into a clustering problem 
which is solved by artificial ants. Ants can pick up/drop objects on a 2D board 
according to a local density measure of similar objets in the neigborhood of 
the ant. In (Lumer and Faieta 1994), artificial ants cluster together data from 
a numerical database. This work is important for us because it introduces the 
concept of clustering ants in data analysis and knowledge discovery in databases 
problems. 
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There are at least two motivations for using an ant-based algorithm in a 
clustering problem. In data clustering (Jain 1998), many algorithms (like K- 
means and ISODATA (Ball and Hall 1965) used in the following) require that 
an initial partition is given as input before the data can be processed. This 
is one major drawback for these methods, and it is important to notice that 
ant-based approaches to clustering do not require such an initial partionning. 
One should also notice that many methods for clustering are entirely based on 
heuristics, and may run fast but are very often locally optimal. One way to 
improve those methods is for instance to introduce a stochastic search rather 
than a deterministic one. 

So we propose in this paper a new method called AntClass, which is based ini- 
tially on Lumer and Faieta’s work, but with major extensions such as introducing 
more robust ant-like heuristics, dealing with “unassigned objects” , speeding up 
convergence with the K-means algorithm, using hierarchical clustering on heaps 
of objects, testing the resulting algorithm on several real world data sets and 
providing a successful comparison with the K-means and ISODATA algorithms. 

The remaining of this paper is organized as follows: section 2 the ant-based 
heuristics of AntClass. Sections 3 presents the hybrid part of AntClass, i.e. the 
use of the K-means algorithm, and the hierarchical clustering technique. Section 
4 describes the experiments which have been performed with AntClass on artifi- 
cial and real world data sets, as well as a comparative study. Section 5 concludes 
on future work and extensions to AntClass. 

2 Ant-based principles of AntClass 

2.1 Objects, heaps and the 2D board 

We assume that a set E = {Oi, of n data or objects has been col- 

lected by the domain expert, where each object is a vector of k numerical values 
vi,...,Vk- For measuring the similarity between objects we will use in the fol- 
lowing the euclidean distance between two vectors, denoted by D. Dmax will 
denote the maximum distance value between two objects of E, i.e.: Dmax = 
maxoi ,Oj€E D{Oi,Oj). 

Initially all the objects will be scattered randomly on a 2D board. This board 
can be considered as a 2D matrix C of mxm cells. This matrix is also considered 
as being toroidal in order to let the cUits travel from one side to another in one 
step. One first important point which has not been studied in previous studies is 
to determine the size m of the board automatically. The number of possible cells 
in the matrix has to be at least greater than the number of objects (m^ > n). 
But if the board is too large, the ants will waste a lot of time and will need many 
moves before encountering an object. So we have chosen the following relation 
between m and n: = n x 4, and we have also observed experimentally that 

this method gives good results. 

Ants will be able to create, build or destroy heaps of objects. A heap H is 
considered to be a collection of at least two objects. A heap is located on a 
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Pig. 1. The definition of a cluster has been changed in AntClass. On the left is repre- 
sented a cluster as in (Lumer and Faieta 1994). A cluster is thus a spatial pattern, but 
two different clusters can be in contact and may thus be difficult to identify. On the 
right is given the new representation of a cluster in AntClass which solves the previous 
problem and allows us to define more robust heuristics. 



given single cell, and is not a spatial pattern, as explained in figure 1. The major 
advantage of this improvement compared to (Lumer and Faieta 1994) is that a 
heap or cluster can be easily identified, and it allows us to define more accurate 
heuristics for dropping or removing objects from a heap, as described extensively 
in the next section. For instance, ants will be able to remove the most dissimilar 
object from a heap, or to add a carried object to a heap provided that this object 
is sufficiently similar to the other objects of the heap. 

For this purpose, we have to define the following notations, for a given heap 
H of uh objects: 

— Dmax{H) is the maximum distcince between two objects of H: 

DmaxiH) = max D(Oi,Oj) 

0 » yOj 



— 0 center {H) is the Center of mass of cill objects in H\ 



Ocenter{H) 



_1_ 

riH 



E 

Oi&H 



Oi 



Objects in this case are considered as vectors of k numerical values. One 
should notice also that 0 center (H) does not correspond to a real object in 
general, 

— Odissim{H) is the most dissimilar object in H: 



max , Ocenter(-ff)) 

Oj^H 
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1. Initialize randomly the ants positions, 

2. Repeat 

3. For each ant anti Do 

(a) Move anti, 

(b) If anti does not carry any object Then look at the 8 cells in the neighborhood 
of anti location and possibly pick up an object (see text for explanation), 

(c) Else (anti is already carrying an object O) look at the 8 cells around anti and 
possibly drop O (see text for explanation), 

4. Until Stopping criterion. 



Fig. 2. The general principles of the AntClass algorithm (which are similar to 
(burner and Faieta 1994)). 



- Dmean{H) is the mean distance between the objects of H and the center of 
mass Ocetiter (-^) * 



DmeaniH) = — Y' D(0i,0 center (H)) 



2.2 Ant-based algorithm in AntClass 

The algorithm at the heart of AntClass is represented in figure 2. The colony 
consists of p ants Anti,...,Antp, and p = 20 in the following. Each ant is lo- 
cated on one cell of the board. Initially this position is generated randomly and 
uniformly, and there is absolutely no central control of the colony. 

The ants move in the following way: initially, a given ant anti selects a random 
direction among the 8 possible ones. Then, anti has a probability PdirecUon to 
further continue in this direction when moving next, else it generates randomly 
a new direction. Each ant also has a speed parameter which tells of how many 
steps it will move in the selected direction before stopping on again. Once it has 
moved, the ant may possibly pick up or drop an object as described in the next 
two paragraphs. 

When the ant is not carrying any object, it looks for a possible object to pick 
up by considering the 8 cells around its current position. As soon as one object 
or one heap is found, then three cases have to be considered: 

1. one object alone: the ant has a fixed probability to pick up the object. 

2. a heap of two objects: since there are two objects only in the heap, we have 
the following specific property: D(Odissimi.H),0 center {H)) = DmeaniH). So 
there is no real heuristic based on distance D to be applied here. This is why 
we have simply given to the ant a probability P destroy to pick up any of the 
two objects which results in destroying the heap. 

3. a heap of more than two objects: the ant picks up the most dissimilar object 

in the heap provided that the heap “dissimilarity” is above a given threshold 
Tremove- The dissimilarity is measimed by . 
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Paxaxneter 


Role 


Range 


Speed 


amplitude of moves 


\Knm 


^direction 


prob. to move in the same dir. 


liHIIM 




object mEix. carrying time 




Pload 


prob. to pick up a single object 


[0.3, 1[ 


^destroy 


prob. to destroy a heap of 2 objects 


[0,0.6] 




min. dissimilarity necessary for removing an object from a heap 


[0.1, 0.2] 


'I'create 


max. dissimilarity permitted for creating a heap of two objects 


[0.05,0.2] 



Fig. 3. The cints paxameters in AntCIass. All paxameters which axe crucial for the 
results axe generated randomly within the indicated bounds. 



It is a simple heuristic but which is powerful because it maJkes the heap more 
homogeneous. 

When the ant is carrying an object, then it looks also at the 8 cells around 
its current location. For each cell, three cases have to be considered again: 

1. the cell is empty: the ant has simply a constant probability to drop the 
object. 

2. the cell contains one object only: the ant will drop the object and will thus 

create a heap of two objects but provided that the carried object O is suffi- 
ciently similar to the one already in the cell {O'), i.e. ^ < Tcreate- 

3. the cell contains a heap: the ant will add its carried object to the heap 
provided that it is closer to H’s center than the most dissimilar object of H. 

In order to avoid that an ant carries an object for a too long time, in the 
case for instance of a very dissimilar object compared to the others, the ant will 
drop this object automatically after Maxcarry iterations on the first empty cell 
it encounters. 

Since real ants have the possibility to memorize several sites in their environ- 
ment (see for instance (Presneau 1985)), we have added a memory to each ant 
in order to speed up the classification. Once an £int has encountered a heap H, 
it may store the location of H, OcenteriH) and D{Odissim{H),OcenUr{H)) in 
its memory. Then, the algorithm for dropping an object and also the algorithm 
that defines the moves performed by the ant are modified as follows: when the 
ant is carrying cin object, it searches in its memory for a heap H on which it 
could drop the object. If it finds one, then the memory of this heap is activated 
and the ant will go to H location. If it has not dropped the object on its way to 
H, then the ant will drop the object on H provided that H is still valid, i.e. it 
has not been destroyed or too much modified by the other ants. Ants may forget 
about a heap because they store those heaps in a First In First Out structure. 

Furthermore, in order to avoid complex parameter settings and to simplify 
the use of AntCIass by domain experts which may not be computer scientists, 
we have used an heterogeneous population of ants with different parameters, 
and thus different behaviors. We have used even more heterogeneous parameters 
than in (Lumer and Faieta 1994). Initially, the ants parameters will be generated 
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randomly within the bounds represented in figure 3. Always the same values are 
used in this paper for all the tested data sets. These bounds have been found by 
several trials on the axtificial data sets. 

Finally, the stopping criterion of this algorithm is simply the number of 
iterations. 



3 Final Ant Class algorithm 

3.1 Combining ant-based with heuristic search 

The previous algorithm based on ants only has the major advantage of providing 
a relevant partition of the data without any initial information about the future 
classification, unlike the K-means or ISODATA algorithms for instance. However, 
two important problems remain. The first one is due to the fact that some objects 
are not assigned to any heaps when the ant algorithm stops, what we call in this 
paper “free” objects. This corresponds for instance to objects which are still 
carried by the ants or to objects which are alone on the board. The second 
problem is that if an object has been assigned to a wrong heap then it can take 
a long time until the object is transported to the right cluster. 

So the solution that we propose is to combine two complementary algorithms, 
i.e. ant based clustering as presented in the previous section, and the K-means 
algorithm. This heuristic algorithm will use the initial partition provided by 
the ants as a “starting point”. Then, the K-means works in the following way: 
it computes the center of each cluster, then it computes a new partition by 
assigning every object to the heap which center is the closest to the object. This 
cycle is repeated during a given number of iterations until the assignment has 
not changed during one cycle. 



3.2 Hierarchical clustering on heaps of objects 

We have noticed experimentally that the two previous steps of AntClass (ants 
-I- K-means) usually give good results in terms of misclassification. However the 
number of classes is always over estimated. We have observed that these two 
first steps of AntClass generate many small heaps but which are very homoge- 
neous. So the idea which is presented in this section consists in considering those 
small and homogeneous heaps as objects themselves or “building blocks” , and 
to perform the two same steps (ants -t- K-means) but on those newly defined 
objects. Furthermore, another motivation is that hierarchical clustering is a very 
standard technique in classification, but it has not been used yet with ants. 

Let us consider now that the ants-l-K-means hybrid algorithm has lead to 
the creation of k heaps of objects. In order to let the ants deal with heaps of 
objects rather than objects, we have simply adapted the algorithms described 
previously: ants will be able to carry an entire heap of objects. The algorithm 
for picking up a heap is globally the same as for objects, and ants will pick up 
a heap with a probability Pioad- We have added however another mechanism 
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in order to avoid that ants carry all heaps at the same time, because as will 
be seen in the next paragraph, the number of heaps will only decrease over 
time. So if only a few clusters remain on the board, it is important not to carry 
them all the time because in this case ants would not be able to further cluster 
them if needed. So once a heap has been dropped, it is marked with a kind 
of pheromone that prevents other ants from picking up this heap during the 
next 500 iterations. Ants drop a heap H\ onto another heap H 2 provided that 

D(,Oc^ntcf(Hl),Oc,nttr(H2)) ^ rp 

n ^ create* 

When H\ and are clustered together, they form only one heap ifs, which 
can not be separated anymore. This makes the convergence faster. 



3.3 Ant Class final algorithm 

So AntClass consists mainly of four steps: (1) ant-based algorithm for clustering 
objects, followed by (2) the K-means algorihm using the initial partition provided 
by the ants, and then (3) ant-based clustering but on heaps previously found, 
and finally (4) the K-means algorithm once more. 

Finally, we should add that all values in the data set are normalized between 
0 and 1 in order to avoid any scaling problem between the different attributes. 

4 Experimental results 

4.1 Experimental settings 

We have applied AntClass to the following numerical databases (numbers in 
brackets indicate respectively, the number of objects, the number of numerical 
attributes, and the number of classes): Artif. 1 (80, 2, 4), Artif. 2 (270, 2, 9), Artif. 
3 (200, 2, 4), Artif. 4 (150, 10, 3), Iris (178, 4, 3), Wine (178, 13, 3), Glass (214, 
9, 2-6)\ Soybean (47, 21, 4), Thyroid ( 215, 5, 3). “Artif. 1” to “Artif. 4” have 
been used to evaluate AntClass on databases with know properties where the 
examples are generated according to gaussian laws (in the same way as Turner 
and Faieta). The other real world databases come from the machine learning 
repository. 

In order to evaluate the resulting partition obtained by AntClass (or the 
K-means and ISODATA algorithms in the comparative study of section 4.3), 
we have set up the following method. We have used databases for supervised 
learning, i.e. databases where the true classes are known thanks to a “Class” 
attribute. But when AntClass is used to partition the data, this “Class” attribute 
is not given to it. It is only a posteriori that the class information is used to 
evaluate the results. 

We have defined two performance measures to evaluate how close is the ob- 
tained partition to the real one. The first measure is a classification error rate. 
It is computed as follows: for a given cluster H obtained by AntClass, consider 

^ Prom 2 to 6 clcisses can be defined for the glass database. 
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Databases 
and perf. 


1: Ant colony 
on objects 


2: K-means 
on objects 


3: Ant colony 
on heaps 


4: K-means 
on heaps 


Artif. 1; Cl. err. 
# of cla. 


11.58 % 
8.15 


0.21 % 
7.76 


0.42 % 
4.24 


0.00 % 
4 


Artif. 2: Cl. err. 
# of cla. 


17.24 % 
22.30 


0.52 % 
17.07 


2.22 % 
10.46 


0.00 % 
9.02 


Artif 3: Cl. err. 
# of cla. 


20.35 % 
15.06 


6.32 % 
14.98 


6.93 % 
5.42 


4,66 % 
4.42 


Artif 4: Cl. err. 
# of cla. 


22.23 % 
5.22 


3.32 % 
5.18 


2.68 % 
2.94 


1.33 % 
2.96 



Table 1. Intermediary and final results obtained on eeich of the four steps of AntClass 
for the four artificial databases. “Cl. err.” stands for “classification error rate” and “# 
of cla.” for “number of classes” . 



the most represented class among H according to the “Class” attribute. All ob- 
jects of H that do not belong to this class are considered as being missclassified. 
The classification error rate is simply the ratio between the total number of 
misclassified objects for all created clusters and the total number of objects in 
the database. The second performance measure is simply the number of created 
clusters. 

All runs have been performed on a very standard PC (Pentium 166). One 
run ends in a few tens of seconds (xisually from 10 to 20 seconds). All presented 
results have been averaged over 50 runs. Finally, one very important point is that 
the ants pcirameters in AntClass were always the same for all the databases, 
i.e. all generated randomly within the same bounds (see figure 3). Ants were 
simulated during 2000 iterations when clustering objects, and 50000 iterations 
when clustering heaps. The number of iterations of the K-means algorithm in 
AntClass was set to 10. 

4.2 Results on artificial databases 

The results reported in table 1 show the progression of AntClass towards a 
relevant classification. In the first step of AntClass, the Ant-based algorithm finds 
an initial partition but with classification errors and really too many clusters. At 
the end of the second step of AntClass, i.e. the use of the K-means on the initicil 
partition found in the previous step, the classification errors are reduced but the 
number of clusters is still really too high. This is due to the fact that the K- 
means algorithm is really sensitive to the initial partition. If this initial partition 
contains too many clusters, then the final partition is unlikely to be the optimal 
one. Once the third step has been performed, the ants converge very closely to 
the right number of classes by working on heaps of objects rather than objects 
themselves. One can notice that some classification errors remain. At the end of 
the fourth step, using the K-means once more decreases the classification errors 
on again. But this time, since the number of classes and an almost optimal 
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Algo. 




AntClass 


AntClass 


K-means 


K-means 


ISODATA 


ISODATA 


Data set 


# of da. 
(real) 


# of cla. 
(aver.) 


cl. err. 
(aver.) 


# of cla. 
(aver.) 


cl. err. 
(aver.) 


# of cla. 
(aver.) 


cl. err. 
(aver.) 


Artif. 1 


4 


4 


0.00 % 


5.63 


2.15 % 


4.53 


1.64 % 


Artif. 2 


9 


9.02 


0.00 % 


9.73 


12.78 % 


6.38 


29.11 % 


Artif. 3 


4 


4.42 


4.66 % 


7.26 


7.30 % 


6.58 


7.84 % 


Artif. 4 


3 


2.96 


1.33 % 


9.60 


0.00 % 


9.53 


0.00 % 


Iris 


3 


3.02 


15.4 % 


6.95 


4.63 % 


4.59 


6.28 % 


Wine 


3 


3.06 


5.38 % 


8.98 


8.57 % 


9.09 


8.63 % 


Glass 


2-6 


7.7 


4.48 % 


7.06 


50.16 % 


2.34 


42.98 % 


Soybean 


4 


4.82 


0.13 % 


7.93 


3.89 % 


7.94 


4.77 % 


Thyroid 


3 


3.28 


6.38 % 


8.77 


8.26 % 


1.48 


14.72 % 



Table 2. Results obteiined by AntClass, K-means cind ISODATA with artificial and 
real world databsaes. 



paxtition have been well determined in the previous step, the K-meajis really 
finds optimal or near optimal results. 



4.3 Comparative study 

We describe now the results obtained with the K-means algorithm eind with ISO- 
DATA (Ball and Hall 1965). Each algorithm is initialized with 10 classes, and all 
objects are initially assigned randomly to these classes. The K-meaus ^llgorithm 
is used with 10 iterations, and has been described previously in section 3.1. ISO- 
DATA is an improved version of the K-means which can (1) delete classes with 
less than Mingbj objects, (2) split one class into two when the deviation in this 
class is above Max dev ^ (3) cluster together two classes when their distance to 
each others is less than Mindut- 

In the following, we have used Minobj = !> Maxdev = 1-027 and Miudist — 
0.117. These last two values have been determined with the data, which favors 
ISODATA because this represents an important initial information which has not 
been given to AntClass. To compute Max dev, the data have been normalized 
linearly between 0 and 1, and we have computed for each class the deviation 
of the objects around the center of their real classes. These values have been 
averaged on all the datasets in order to compute Max dev Mindut has been 
computed in the same way by considering the average over each database of the 
minimum distance between two classes centers. 

ISODATA is also run during 10 iterations. We have tried larger numbers of 
iterations for both algorithms but the results were similar. 

Results obtained by the three algorithms (AntClass, K-means and ISODATA) 
are represented in table 2. As can be seen, AntClass outperforms the two other 
algorithms, both in terms of classification errors and of correct number of classes. 
The only exception is for Fisher Iris database. In this database, the Setosa class 
is completely distinguishable from the two others. The last two classes are more 
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difficult to separate, unless more than three clusters are created, which is what 
K-means of ISODATA do. 

5 Conclusion 

We have presented in this paper a new hybrid and cint-based algorithm named 
AntClass for data clustering in a knowledge discovery context. The main fea- 
tures of this algorithm are the following ones. AntClass deals with numerical 
databases. It does not require any initial information about the future classi- 
fication, such as an initial partition or an initial number of classes. AntClass 
introduces new heuristics for the ant colony, and also an hybridization with the 
K-means algorithm in order to improve the convergence. We have also intro- 
duced in AntClass hierarchical clustering where ants may carry heaps of objects 
and not just objects. Furthermore, AntClass uses an heterogeneous population 
of ants in order to avoid complex parameter settings to be performed by the do- 
main expert. Finally, AntClass has been tested with success on several databases, 
including real world ones. 

Future work consists in testing how this model scffies with larger databases 
of several thousands of examples. We are also considering other sources of inspi- 
ration from real ants for the clustering problem. For instance, ants that meet on 
the board could exchange objects. 
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Abstract. This paper is concerned about the origin of pheromone com- 
munication in complex societies, e.g., colonies of real ants and bees. The 
aim of the work is to study whether pheromone communication among 
artificial ant agents in a cooperative foraging scenario can arise simply 
by letting selection favor successfully foraging genotypes. For such aim, 
we introduce the ants war in which ant agents foraging more food items 
than the opponent can survive to next generation. In the experiments, we 
confirmed the emergence of pheromone communication without a specific 
predefined one. 



1 Introduction 

In nature, many species of animals exhibit a high level of cooperative behav- 
ior and use various means of communication for their lives. Typical examples 
are societies of real ants and bees, which are known as social insects [7]. In ant 
and bee colonies, individuals have simple limited abilities but the macro-level 
behavior is sufficiently complex to enable the ants and bees to adapt to their 
dynamic environment. Pheromone, which is a simple chemical medium of com- 
munication, is known to play an important role in the formation of their complex 
behavior. For instance, in ant colonies, ants that are discovering food items lay 
the way to their nest with a specific type of pheromone that attracts other ants. 
Ants that sense the pheromone follow the pheromone trail to the food. When all 
the items of food are carried back to the nest, the pheromone evaporates, and 
the pheromone trail, which is no longer necessary, automatically disappears [1]. 
Several researchers have studied designed pheromone communications in multi- 
agent systems [3] [5] [6]. 

In this paper, we focused on the fact that these social insects have ob- 
tained pheromone communication in their evolutionary process, and the origin 
of pheromone communication in complex societies. The aim of the work is to 
study whether pheromone communication among artificial ant agents in a coop- 
erative foraging scenario can arise simply by letting .selection favor successfully 
foraging genotypes. For such aim, we introduce the ants war in which ant agents 
foraging more food items than the opponent can survive to next generation. 
This evolutionary process is based on results of the war, and any special fitness 
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Grid World in Ants War 



Selected 4 Individuals 

, . , (Parents) 

(winner) 




Action Decisior) 
C Action ) 



Return 4 Individuals to the pool 



Phenotype 



Fig. 1. Ants War Environment and Ar- Fig. 2. A series of GA operations in one 
chitecture of the Ant Agent generation 



functions are not implemented. The meanings of pheromone are not defined by 
the designer, and the ant agents autonomously gain pheromone communication 
throughout an evolutionary process in Genetic Algorithm (GA). In the computer 
experiments, we confirmed the emergence of pheromone communication without 
a specific predefined one although simple genetic operators were implemented. 



2 Ants War and Evolutionary Strategy 

The ants war is introduced as the stage for emerging of pheromone communi- 
cation, and it is designed in the grid environment with several kinds of compo- 
nents, i.e., ’’white ants colony”, ’’black ants colony”, ’’white pheromone”, ’’black 
pheromone” and ’’food items” (see Fig. 1). The objective of each colony is to 
carry back more food items than the opponent. Each ant can sense only infor- 
mation on neighbor grids of its current position, and must decide actions from 
only sensing information. The information includes whether the sensing ant do 
adjoin fellows or not, opponents or not, food items or not, and the intensities of 
own colony pheromone. The ants in a colony have same 3 layer general neural 
network to decide its action from 7 actions, i.e., ’’move forward”, ’’move back- 
ward” , ’’move to right” , ’’move to left, ’’force to the food” , ’’stand by” , and ’’put 
pheromone” . The ants of any single colony are assumed to be completely identi- 
cal for a simplification. The food items are too heavy to be carried back by a few 
ants, thus the ants are required to cooperate with each other and to allocate their 
resource for appropriate food items. Namely, the communication among a colony 
of ants is very important to win this war. Phermone as an only communication 
medium for ants, which is put and sensed by ants, is non-negative real value on 
each grid, and it automatically diffuses and evaporates as its characteristics. 

For evolutive pheromone communication, GA is applied to genotypes which 
are encoded according to the weight sets of neural networks directly. At each 
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(a)Crossover Operation 




(b)Mutation Operation 





Generation (xlOOO) 



Fig. 4. The graph representing the ra- 
Fig. 3. The crossover and mutation op- ^jQg jq against the criterion and the 
erations used in the GA standard deviation 



generation of GA, two pairs of genotypes are randomly selected from the genetic 
pool, and two colonies of each pair are fought in the ants war. Two winners can 
become parents and reproduce their offspring applied mutation and crossover 
genetic operators [4] (see Fig. 3). The number of genotypes in the pool are always 
kept the constant. In this GA, the selection pressure for genotypes is based on 
only results of the war, and any special fitness functions are not implemented. 
We study whether pheromone communication can arise or not without a specific 
predefined one by using this simple evolutionary mechanism. 

3 Experimental Results 

We made 7 trials of computer experiment with 20 genotypes population in the 
pool. The max generation of GA was set to 50,000. At first, it is necessary to 
show whether the evolution properly occurred or didn’t. However, it is difficult 
because GA doesn’t have any global fitness functions to evaluate genotypes. 
Thus, we measured the progress of the evolution by the average ratios to win 
against the final population as the criterion population. All individuals in the 
population of each generation were fought to all ones in the criterion final popu- 
lation. [2]. The number of 400(20 genotypes x 20 genotypes) matches were made 
for evaluating one population. If the progress was occurred favorably, the av- 
erage ratios to win against the criterion would be gradually increased from the 
first population to the final, and be close to 0.5 finally. In Fig. 4, the solid and 
the dashed lines represent the average ratio to win and the standard deviation 
in experimental trials respectively. We can confirm the evolution progressed in 
each trial and the selection pressure certainly worked. Moreover, the final popu- 
lations as the criteria almost show the best performances in all populations, and 
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(a) Initial state 




(b)t=100 



(c) t=200 
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Fig. 5. The screen shots in the ants war with the effective pheromone communication. 



the choice of criteria is suitable. If the choice is not appropriate, most populations 
would win against the criterion or most would lose, and we can’t appropriately 
evaluate each population. 

Next, we observe the evolved ant agents in 7 trials. The obtained ant agents 
are roughly classified into following three types. 

Type 1: The drifting ants without pheromone. These ants tend to drift on the 
grids randomly. If they touch the food or fellow ants in front, take the forcing 
action. They scarcely put a pheromone in the war, i.e., the emergence of the 
communication was not occurred. 

Type 2: The ants drifting up and down with the simple pheromone commu- 
nication. These ants drift up and down on the grids, and the ants clinging 
to the fellows at the left side continuously take the forcing action. Accord- 
ingly, They often take the formation like a bar. If no ants touch the food, 
they fall into the deadlock situation. When the leftest ant in the bar for- 
mation doesn’t touch the food, this ant occasionally puts the pheromone. 
Then, other ants sensing the pheromone move from current positions, and 
they can resolve the deadlock situation. We can say that this is the simple 
pheromone communication. 

Type 3: The marching ants with the effective pheromone communication. The 
ants sensing little intensity of pheromone move backward, and they take 
forcing action when be in contact with the food or the fellow at front 
side. In addition, they indiscriminately put the pheromone. So, the ants 
which are initially placed at upside area reach appropriate foods efficiently. 
While, the remainders gather in the downside area, so the intensities of the 
pheromone in the downside become greater by indiscriminate putting action. 
The ants sensing strong pheromone begin to march forward with putting the 
pheromone, and they can reach food items. Especially in the situation that 
the food is obtained and disappeared, they quickly begin to march forward 
and reach other foods again. The screen shots of the war with this type of 
ants are shown in Fig. 5. In the war, most ants were observed to take part 
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in forcing the foods, and they highly cooperated with each other to win the 
war by using the pheromone communication. 

For investigating which types of ants is better than the others, the final pop- 
ulations of each ants were fought wdth each other as evaluating the populations 
with 400(20 X 20) matches. The results are as Type 1(372 wins) - Type 2(28 
wins). Type 2(8 wins) - Type 3(392 wins) and Type 1(20 wins) - Type 3(380 
wins). The reason that type 2 is very weak is that they are poor to find the 
baits and cooperate for forcing although they use the primitive pheromone style 
communication. Type 3 is strongest in all 3 types as we expected, and it is said 
that the good communication as type 3 is effective for the cooperation to win. 

4 Conclusion 

Throughout computer experiments, we confirmed that pheromone communica- 
tion emerged without a specific predefined one. Tlie meanings of pheromone 
were different in experimental cases, and the ants cooperated to survive in GA 
by using of effective pheoromone communication. VVe conclude it is possible for 
artificial complex societies to emerge their communications for their objectives 
based on simple evolutionary strategies. 
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Abstract. Males may use sexual displays to signal their quality to fe- 
males; the handicap principle provides a mechanism that could enforce 
honesty in such cases. Iwasa et al. [1] model the signalling of inherited 
male quality, and distinguish between three variants of the handicap prin- 
ciple: pure epistasis, conditional, and revealing. They argue that only the 
second and third will work. An evolutionary simulation is presented in 
which all three variants function under certain conditions; the assump- 
tions made by Iwasa et al. are questioned. 



1 Sexual Signalling and the Handicap Principle 

Sexual selection is a distinct subset of natural selection. The idea is that evolution 
is an exam with two papers: in order to reproduce, an animal must not only 
survive to adulthood, but, in a sexual species, it must gain mating opportunities 
with members of the opposite sex. One of Darwin’s insights was that selection for 
sexual attractiveness and selection for survival could exert opposing evolutionary 
pressures. If, for some reason, females came to prefer males with elaborate and 
costly ornaments, such as the peacock’s tail, then sexual selection would push 
towards yet more costly ornaments, because males with longer tails experience 
greater mating success. At the same time, natural selection would push for less 
costly ones, because males with longer tails are more vulnerable to predation 
and less likely to survive to adulthood. 

An early explanation for extreme male ornament traits and female preferences 
was that an initial, random bias led to linkage between trait and preference 
genes and that a runaway cycle of exaggeration then took place [2]. Another 
possibility, more recently explored, is that male ornaments function as “indicator 
mechanisms”, i.e., that they are used by males to signal their quality as mating 
partners to females [3]. This paper describes an evolutionary simulation, based 
on a population-genetic model by Iwasa et al. [1], that explores the conditions 
under which this kind of sexual signalling could evolve. 

The suggestion that male ornaments are signals of mate-quality leads to 
a problem in understanding how honest signalling could be maintained. Why 
should low-quality males ever honestly signal their condition, when by doing so 
they will make themselves unlikely to be chosen as mates? Why wouldn’t all 
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males produce the maximum advertisement, regardless of their true quality — all 
claiming, in effect, to be the most desirable. Zahavi’s handicap principle provides 
a possible mechanism by which a sexual signalling system could be kept honest. 

Zahavi [4] suggested that honesty could only be maintained in a communi- 
cation system if the signals were costly in some way. He proposed the counter- 
intuitive idea that signallers sacrifice some of their fitness (i.e., impose a handi- 
cap on themselves) in order to produce signals that will be believed by receivers. 
In the sexual signalling case, perhaps males are signalling their quality (e.g., 
vigour, viability or territory size) to females, and are being rewarded with a 
mating episode if they “convince” a female that they are of high quality. Let 
us suppose that the male signal is tail length. Zahavi realized that if growing a 
longer tail was cheap, i.e., if it had little deleterious effect on male fitness, then 
the signalling system would be vulnerable to bluffing: all males would come to 
have long tails, and the female preference for longer-tailed males would no longer 
be selected for. But if growing a long tail was costly in fitness terms, the commu- 
nication system would not be corrupted by bluffers: lower quality males would 
not be able to afford the necessary resources for growing a long tail. Tail length 
becomes an honest indicator of male quality because “cheating” is prohibitively 
expensive. Zahavi reasoned that only those communication systems in which the 
signal happened to be costly would escape collapse due to bluffing. Therefore, 
the stable systems found in nature are maintained by this mechanism. 

When the handicap principle was first introduced, it was generally not ac- 
cepted by theoretical biologists. Population-genetic models [5, 6] seemed to show 
that it could not be evolutionarily stable. However, the potential effectiveness 
of the handicap principle has been validated by several mathematical models in 
recent years; foremost among these is a model by Grafen [7]. This model estab- 
lishes that the handicap principle can work, but specifies an important proviso: 
the unit cost of producing the signal must be greater for a low quality signaller 
than for a high quality signaller. In other words, the fitness cost of extending 
one’s tail by an extra centimetre must be higher for unhealthy or weak males 
than for healthy strong ones. 

The handicap principle was maligned and misunderstood because Grafen’s 
proviso about differential unit costs was not clear from Zahavi’s original formu- 
lation, and because several distinct interpretations of the principle are possible. 
Iwasa et al. [1] attempted to cut through the confusion. They detailed three vari- 
ant interpretations of the handicap principle, listed below, and suggested that 
different findings concerning the evolutionary stability of handicapped signalling 
could be explained by the fact that some authors were modelling one version and 
others another. 

1.1 Pure Epistasis Handicap 

In this variant, a particular set of genes determine a male’s tail length, and the 
longer his tail, the less likely he is to survive to reproductive age. However, his 
survival is also determined by his quality: higher-quality males are more likely 
to survive, and for any given level of quality, a male is more likely to survive 
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if he has a shorter tail. Therefore the males that are most likely to die before 
reaching reproductive age are those of low quality with long tails. Observing the 
adult population, one would find a correlation between the genes for quality and 
tail length. In technical terms, epistatic selection has resulted in linkage dise- 
quilibrium; in plainer language, long tails are linked to high quality, because all 
the long-tailed low-quality males died young. In consequence, a female’s prefer- 
ence for mating with long-tailed adult males will mean that she is more likely 
to achieve her goal of mating with a high-quality male. 

1.2 Conditional Handicap 

A long tail still reduces a male’s chances of survival to reproductive age, and 
again survival is primarily determined by quality. However, the expression of 
the gene for tail length is modified by quality: males of lower quality will not 
realize their full, genetically specified tail length but will grow a proportionately 
shorter tail. It is assumed that only the highest quality males have the resources 
to fully realize the tail length encoded in their genes. Because the expression 
of the tail-length gene is quality-dependent, observable tail length is correlated 
with quality even before mortality has taken its toll. A female preference for long 
tails will therefore translate into a preference for high-quality males. 

1.3 Revealing Handicap 

The expressed tail length of males is determined directly by a gene, as with 
the pure epistasis handicap. Survival to reproductive age depends on quality 
modified by tail length, as before. However, when the males reach reproductive 
age and are competing to be selected by females, only high-quality males succeed 
in maintaining their tails at their original, genetically specified length. Males of 
lower quality are less well able to withstand the rigours of their environment, and 
their tails are shortened due to, for example, attacks by predators or parasites 
[8]. Low quality males reveal their status by tending to have shorter tails as 
adults. Females preferring to mate with long-tailed males will thus mate with 
higher quality males on average. 

2 Modelling Sexual Signalling of Genetic Quality 

Almost all of the models that have recently demonstrated the plausibility of the 
handicap principle [7, 9, 10] have made a major simplifying assumption: namely, 
that the underlying male quality of interest to females is environmentally deter- 
mined. This could mean, for example, that males are advertising their level of 
nutrition, or the quality of the territory they possess. This assumption misses the 
interesting subset of cases in which males are believed to be informing females of 
their genetic quality. For example, when sage grouse Centrocercus urophasianus 
mate the males contribute only their sperm, leaving all other aspects of the 
project of raising offspring to females. Nevertheless, the females choose their 
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mate carefully on the basis of his ornaments, display behaviour, and central po- 
sition in the mating arena [11, 12]. If the males are advertising anything in this 
case, it must be their inherited genetic quality. 

It is not clear that an honest signalling equilibrium will exist for signals 
of genetically determined quality. A central problem is that there might not 
be any residual variation in male quality, and thus nothing to signal about. If 
the males were honestly advertising their quality, and females were choosing 
to mate with high-quality males, then after a few generations the males will 
all be clustered around the optimum quality level. As Maynard Smith [13] has 
argued, there should be no heritable variation remaining in fitness-related traits 
at equilibrium. And yet female sage grouse pay the costs of choice (e.g., time 
costs and predation risk) in order to choose the best male, when the male will 
contribute only his genes. This is known as the paradox of the lek: why aren’t 
modern sage grouse males all maximally viable, and thus equally attractive to 
females? The most likely answer is that mutation on fitness-related traits is 
negatively biased. That is, a single mutation event affecting the genes controlling 
a fitness-related trait is more likely than not to decrease the value of that trait. 
There is some empirical evidence for this: at least one component of fitness 
is mildly heritable in Drosophila [14]; this could only occur if mutational load 
kept fitness-related traits below their optimum value. A similar conclusion was 
reached in a review of evidence from many avian species [15]. 

Iwasa et al. [1] constructed a population-genetic model of the evolution of 
costly male advertisements and female preferences; they incorporated just such 
a negative mutation bias on the viability trait. Iwasa et al.’s model purports 
to show that honest signalling of genetically determined male quality can be 
evolutionarily stable. It is one of the very few models to deal with genetically 
determined quality, and provides the basis for the simulation described in this 
paper. 

Iwasa et al. reasoned that if females were prepared to pay a cost for their pref- 
erence, then there must be information worth having in the expressed values of 
the advertisement trait, and it was therefore an honest indicator of quality. Iwasa 
et al. derived three conditions for the existence of such a costly-preference equi- 
librium. The first was that mutation on the viability trait had to be negatively- 
biased. Otherwise, male viability levels would be clustered around the optimum, 
and the females would be in a position where random mating was just as likely 
to result in a high-viability partner as was a costly preference. The second con- 
dition was that the genetic correlation between preference and viability had to 
be greater than the product of the correlations between advertisement and pref- 
erence and between advertisement and viability. Another way of putting this is 
that there must be a link between preference and viability that does not come 
about solely because of their joint relationship with the male advertisement trait. 
Finally, Grafen’s proviso, that the unit costs of signalling must be greater for 
lower-quality signallers, must also hold. 

Iwasa et al.’s second equilibrium condition implies that whereas the condi- 
tional and revealing handicaps will work, the pure epistasis handicap will not. 




648 



In the conditional and revealing handicaps, the viability trait directly affects the 
expression of the male’s advertisement — viability modifies the expression of the 
genes for growing an ornament of a particular size, or, in the case of the reveal- 
ing handicap, low viability means that a large ornament cannot be successfully 
maintained as an adult. Valuable information for females concerning male via- 
bility has thus been built into the expressed male trait. However, in the pure 
epistasis handicap, the realized size of the male advertisement is only linked to 
viability indirectly, via differential survival. This means, in turn, that there is 
no special link between the genes for preference and for viability. 

In order to keep their analysis tractable, Iwasa et al. made a number of 
simplifying assumptions. Most critical was the assumption that the genetic co- 
variances between male advertisement, female preference, and the viability trait 
could all be treated as positive constants. In the real world, genetic covariances 
are of course not constant but change as the population evolves over time; even if 
we suppose that the covariances might start out positive, it is not clear that they 
would remain so. And it is more likely that the covariances in a plausible initial 
population would be close to zero. The problem highlights a weakness of the 
population-genetic approach: despite the name, there is in fact no population, 
which means that such important variables as genetic variances and covariances 
must be input into the model as parameters, rather than being measurements 
that are made with respect to an evolving lineage. The question as to whether 
Iwasa et al.’s conclusions would hold without assuming constant positive covari- 
ances [3] presents an excellent opportunity for an evolutionary simulation. 

3 Description of the Model 

The work reported here is an implementation of Iwasa et al.’s [1] model as an 
individual-based evolutionary simulation. The population consists of sexual indi- 
viduals breeding in discrete, non-overlapping generations. Individual organisms 
have both a genotype and a phenotype; the genotype consists of real-valued ge- 
netic parameters. Each organism carries a gene for the male advertisement or 
ornament trait {tgen), the female preference trait {pgen), and the general viability 
trait {vgen)- An individual’s phenotype consists of two real values: either tphen 
or Pphen, depending on sex, and Vphen- Genotypic and phenotypic parameters 
are always real numbers between zero and one inclusive. 



3.1 Development stage 

At birth, each individual’s sex is chosen at random, and its phenotypic trait val- 
ues are determined. Normally, each trait is read off the genome, then a random 
gaussian error term is added (p = 0, n = 0.005), and the resulting value stands 
as the expressed trait. The phenotypic male advertisement is normally read off 
the genotypic value of tgen- However, in the conditional and revealing handicaps 
the male’s viability also influences the expressed ornament size — the phenotypic 
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viability is therefore calculated first. For the conditional handicap, the adver- 
tisement size that would otherwise be expressed is reduced by an amount pro- 
portional to Vphen- In other words, tphen = tgen X Vphen- Only males with the 
maximum possible viability actually produce an advertisement that is as big as 
their genotype specifies. 



3.2 Survival stage 

Some individuals survive to adult reproductive age, and some die young. ^ An 
individual’s basic probability of survival is equal to its phenotypic viability: less 
viable animals are less likely to survive. However, both male advertisements and 
female preferences are costly, and the cost of these characteristics is manifested 
as a reduction in an individual’s probability of survival, according to the degree 
of the trait’s phenotypic expression. 

Grafen’s proviso, in which the unit costs of advertisement are lower for higher- 
quality signallers, is enforced at this stage. The basic probability of survival 
(wp/ien) is first converted to an odds ratio, and then scaled by (1 - ! 

where Cadv represents the cost of advertising. If Cadv = 0 then there is no cost 
at all to males for growing ornaments; if tphen = 0 then a male will pay no costs 
regardless of how high Cadv might be. The scaled odds ratio is then converted 
back to a probability value. The result of all this manipulation is the following 
expression for the probability of survival: 



_ r>pften(l tphen)^'”^*' 

Psurvival - + 1 ' f 

The scaling factor implements Grafen’s proviso, because individuals with high 
phenotypic viability will be best able to bear the costs of advertisement. 

The survival costs of female preference are assessed in exactly the same way 
as the costs of the male trait: Pphen is simply substituted for tphen-, and Cp^ef for 
Cadv, in (!)• Theories of handicap signalling generally do not require that female 
preference should involve anything other than a simple cost that is independent 
of viability; however, calculating female costs in the same way as male ones 
allows the costs borne by each sex to be directly compared. 



3.3 Mating stage 

Surviving males and females are then able to breed: females get to exercise their 
preferences, and males may experience the benefits of their costly ornaments. A 
surviving female is randomly chosen, and she is then presented with a “lek” of 
eight males, also selected at random. With a probability equal to her preference 

^ To prevent extinctions, in the rare event that no males (or no females) survive to 
adulthood, one male (or one female) is randomly chosen for resurrection. 
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value {pphenf , she selects the male with the largest expressed advertisement trait 
to mate with. If she does not choose this male, she chooses randomly from among 
the eight males. Thus, high-preference females are likely to end up mating with 
the male with the biggest ornament, while zero-preference females will mate with 
anyone. The results of an earlier simulation [16] suggest that this method can 
be effective in producing sexual-selection effects, and that eight is a reasonable 
lek size. Note that in Iwasa et al.’s model, female preferences were expressed 
relative to the population mean of the male advertisement trait. Having a model 
in which individuals really exist allows us to avoid the dubious assumption that 
females could know the population mean; instead, females choose a mate from 
among those males they happen to come into contact with. 

The mate selection process continues until sufficient offspring have been pro- 
duced to stock the next generation. Crossover is simple: newborn individuals 
inherit the mean of their parents’ values for each real-valued genetic parameter. 
The mutation operator is a random gaussian (p = 0, u = 0.03) added to each 
gene. The all-important negative mutation bias on viability is implemented by 
subtracting 0.003 from whatever value a newborn individual’s genetic viability 
would otherwise have been. If the mutated value of any trait would be less than 
zero or greater than one, it is truncated accordingly. 

4 Results 

The population consisted of 100 individuals, and evolution proceeded in each run 
for 5000 generations. Unless otherwise stated, the results summarize a window 
period over the last 500 generations, and are averaged across 10 repeated runs 
in each case. The repeated runs in the various conditions were each performed 
with a different seed for the pseudo-random number generator. The simulations 
have been conducted over a range of values for the advertising and preference 
costs Cadv and Cpref- Earlier work [16] suggests that males will be prepared to 
bear much higher costs in advertising than females will tolerate in expressing a 
preference, and the range of cost levels investigated reflects this. 



4.1 Pure Epistasis Handicap 

Iwasa et al. argue that the key evidence for sexual signalling is the willingness of 
females to bear a costly preference; Figure 1(a) shows that such preferences do 
indeed evolve as long as both male and female costs are not excessive. But are 
females really gaining information about male quality from the advertisement 
traits they observe? Figure 1(b) shows the correlation, for adult males, between 

^ Phenotypic preference values of less than 0.1 axe in fact set equal to zero, i.e., females 
with sufficiently low preference values mate randomly. This is to avoid a situation 
in which there is selection pressure for random mating but the mean value of p 
never quite reaches zero due to recurrent mutation. This would lead in turn to 
a small female preference being manifested, which might well be enough to push 
males towards advertising when they would not otherwise have done so. 
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their expressed advertisement and their underlying quality. It is clear that when 
the cost of advertising is greater than zero but less than about 5, male ornament 
size is modestly correlated with viability, and therefore does carry information. 
Signalling of viability does not occur across the full range of cost values, but it 
certainly appears to be occurring in one region. This contradicts Iwasa et al.’s 
claim that the honest advertisement of viability cannot be evolutionarily stable 
given the pure epistasis handicap. 



Female preference Trait-viability correlation 





(a) (b) 

Fig. 1. Results for the pure epistasis handicap condition, by Cadv and Cpref- (a) Mean 
female preference values, (b) Correlation between the expressed male advertisement 
trait and underlying viability. 



4.2 Conditional Handicap 

Figure 2(a) shows the mean values for female preference: again, females were 
sometimes prepared to bear a cost in order to choose ornamented males. It 
is interesting to observe that, although preference falls off somewhat as its cost 
increases, high preference values are most likely to evolve when the cost of adver- 
tising is low. Figure 2(b) shows significant correlations between expressed male 
advertisement and underlying viability, especially when advertising costs are low. 
The results therefore support Iwasa et al.’s conclusion that an honest-signalling 
equilibrium could be stable under the terms of the conditional handicap. 

4.3 Revealing Hsindicap 

The results for the revealing handicap simulations were very similar to those for 
the conditional handicap; no additional graphs will be presented. 

5 Discussion 

Honest signalling of viability occurs in the pure epistasis handicap, despite the 
fact that Iwasa et al. [1] claimed it could not. In the conditional and revealing 
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(a) (b) 

Fig. 2. Results for the conditional handicap condition, by Cadv and Cpref- (a) Mean 
female preference values, (b) Correlation between the expressed male advertisement 
trait and underlying viability. 



handicaps, on the other hand, the results presented here are in accordance with 
Iwasa et ah’s prediction that honest signalling could be evolutionarily stable. 
Iwasa et al. intend their paper to clarify some of the controversies around the 
handicap principle. They argue that their findings explain why some earlier 
papers have concluded that the handicap principle can work while others have 
concluded that it cannot: different authors have tried to model different versions 
of the idea. Iwasa et al.’s intended clarification is an admirable goal; however, 
the results of the simulations presented here suggest that their conclusions must 
be taken with a grain of salt. Their assumption that genetic covariances could 
safely be treated as positive constants does not appear to have been a reasonable 
one. 

The conditional and revealing handicaps deserve closer scrutiny. Consider 
Figure 2(b), which shows the correlation between expressed advertisement and 
viability in the conditional handicap case; results for the revealing handicap were 
similar. The graph shows that the highest correlations were achieved when ad- 
vertising was cost-free. In the pure epistasis handicap, by contrast, we find that 
“talk is cheap” in these cases: Figure 1(b) shows that male advertisement was 
never an indicator of quality when the cost of advertising was zero. Why then, 
in the conditional and revealing handicap conditions, can females trust the ad- 
vertisement levels of males who, in theory, can choose any advertisement level 
they like because there is no cost involved? The answer is that the males cannot 
choose any advertisement level that they might like. The stipulation that the 
expression of the ornament trait is condition-dependent (i.e., modified by via- 
bility) builds in an informational link between advertisement and viability in a 
rather uninteresting way. It seems disingenuous of Iwasa et al. to hold up the 
existence of costly female preference and honest advertisements as a deep result 
when the way in which the male trait is expressed itself enforces honesty. Thus 
we find that the genetic correlation between the advertisement trait and viability 
remains low in the conditional handicap case, but the correlation between the 
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actual expressed advertisements and the underlying viability of adult males is 
very much higher: the condition-dependent expression of the ornament means 
that females automatically get useful information about viability. In addition, 
it is a little odd to claim that “handicap” signalling is occurring when the cost- 
free signals are the most reliable. Considered closely, Iwasa et al.’s claims for 
the conditional and revealing handicaps amount to little more than the uncon- 
troversial observation that females will attend to unfakeable information about 
male quality. 

Finally, it should be noted that all of the simulation results depend on 
Grafen’s proviso that the unit costs of advertisement (and in this case pref- 
erence as well) should be lower for higher-quality individuals. One’s faith in the 
simulation results must depend on one’s faith in Grafen’s proviso as a real-world 
condition. 
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Abstract. This paper describes a model for the evolution of communication 
systems using simple syntactic rules, such as word combinations. It also focuses 
on the distinction between simple word-object associations and symbolic 
relationships. The simulation method combines the use of neural networks and 
genetic algorithms. The behavioral task is influenced by Savage-Rumbaugh & 
Rumbaugh’s (1978) ape language experiments. The results show that languages 
that use combination of words (e.g. “verb-object” rule) can emerge by auto- 
organization and cultural transmission. Neural networks are tested to see if 
evolved languages are based on symbol acquisition. The implications of this 
model for Deacon’s (1997) hypothesis on the role of symbolic acquisition for 
the origin of language are discussed. 



1. Symbol acquisition in the evolution of communication 

The synthetic approach of Artificial Life has recently been applied to studying the 
evolution of communication and language (Steels, 1997a). Some models have been 
used for the simulation of the emergence of simple lexicons in populations of 
simulated organisms (e.g. Cangelosi & Parisi, 1998; Steels, 1997b) or in small 
communities of robots (Steel & Vogt, 1997). In these studies organisms evolve shared 
lexicons for describing entities and relations of the environment. Other models have 
focused on the evolution of syntax (e.g. Batali, 1994; Kirby, in press). Simulated 
organisms evolve different syntactic languages starting from a given set of syntactic 
structures and constraints, and devices for syntax acquisition. 

The first type of models, that focus on lexicon emergence, do not make any explicit 
reference to the role of syntax in language origin. Their aim is to model the early 
stages of the evolution of (animal) communication. Indeed, in animal communication 
systems, no syntactic structures have been observed. For example, no animal 
communication systems have been found that share one of the main properties of 
human languages, i.e. the combination of words to express different and new 
meanings. These models of lexicon evolution study communication systems based on 
simple signal-object associations. Organisms learn and evolve simple stimulus 
associations between objects in the environment and signals. 

In the second type of models the evolution of syntax is simulated. These models, 
that for example show how syntax can emerge without natural selection (Kirkby, in 
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press), do not explain the possible role of syntactic languages for organisms’ 
adaptation and survival. Moreover, the associations that organisms learn are self- 
referential symbol-symbol relationships. These models are subject to the symbol 
grounding problem (Harnad, 1990) since they lack an intrinsic link between their 
symbols and the entities and relations existing in the organisms’ environment. Internal 
symbols need some form of sensorimotor grounding. Due to the symbol grounding 
problem, the role of these models for understanding the evolution of cognition is 
reduced. The Artificial Life methodology used in this model, instead, allows to 
overcome the symbol grounding problem. Simulated organisms will use symbols 
whose semantic referents are constituted by categorical representations in the neural 
network's hidden layer. These semantic representations are activated by the actual 
presence of their referents in the organism's world. 

Recently, Terrence Deacon (1997) proposed an explanation for the fact that animal 
communication and human language differ. A variety of animal communication 
systems have been studied (Hauser, 1996), however, there is no apparent continuity 
between animal communication systems and complex human languages. That is, no 
“simple languages”, using some elementary forms of word combinations or syntax, 
have been found in the animal kingdom. The existence of simple languages could 
explain the gap between animal and human communication. Deacon (1997; 1996) 
believes that this is due to the symbol acquisition problem. In fact the main difference 
between animals and humans relies on symbolic references. There is a significant 
difference between the referencing system of simple object-word associations and that 
of symbolic associations. In animals, simple associations between world entities and 
words can be explained by mere mechanisms of rote learning and conditional 
learning. An animal acquires genetically, or learns, that a word’s sound is always 
associated with a specific object. Instead, symbolic associations have double 
references, one between the word (symbol) and the object, and the second between 
the symbol itself and other symbols. A language-speaking human knows that a word 
refers to an object and also that the same word has logical (syntactic) relation with 
other words. Due to the possible combinatorial interrelationships between words, 
there can be an exponential growth of reference with each new added word. 

The difference between these types of associations, and their relation to the models 
of language origin, is graphically represented in Figure 1. Figure la represents a 
communication system based on simple associations between objects and words. It 
refers to the models of the acquisition of lexicons. Figure lb represents the models of 
the origin of syntax. It only shows word-word associations, but this system is not 
based on real symbolic associations as a link is missing between words and objects. 
Words are self-referencing and they lack a grounding in the external world. Figure Ic 
shows a system based on grounded symbolic associations. The arrows represent 
references between words, and references between words and objects. This third type 
of association system can be simulated through Artificial Life methodology, upon 
which the present model is based. 

It should be noted that the relationship between words and objects, that constitute 
the grounding of symbols to entities of the real world, is not a direct link between 
mental symbols and real objects. Instead, it is a link between mental entities (the 
symbols or words) and other mental entities (such as concepts) that constitute the 
semantic reference. These categorical representations, that Deacon (1996) calls 
“indexical”, are useful to “sort ouf’ the extensive perceptual variability of objects in 
the real worlds. The ability of humans and animals to create categories, e.g. through 
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categorical perception, constitutes the “groundwork” of cognition (Harnad, 1987). 
From this it is possible to build more complex cognitive skills, such as language. 




Fig. 1. Associations between objects (pictures) and symbols (words) in language origin models, 
(a) Simple stimulus associations between objects and words in the models of the origin of 
lexicons, (b) Self-referential associations between words that lack sensory-motor grounding in 
the models of the origin of syntax, (c) Grounded symbohc associations. Words have links with 
objects and logical relationships between themselves. Objects and words were chosen from 
Savage-Rumbaugh & Rumbaugh’s (1978) experiments on ape language. 

Deacon’s hypothesis on the role of symbolic learning in the evolution of human 
language is supported by ape language studies and by neuropsychological and 
neurophysiolgical evidence (Deacon, 1997). For example, experiments on language 
acquisition in chimpanzees have been used to support the idea that animals tend to 
learn language using simple word-object associations. However, apes can be taught 
real symbolic associations under special experimental conditions (Savage-Rumbaugh 
& Rumbaugh, 1978). Moreover, in these language-speaking animals the spontaneous 
use of the grammatical rule “verb-object” has also been observed (Greenfield & 
Savage-Rumbaugh, 1990). 

This paper aims to test a model of the origin of communication and language that 
deals with the evolution of different types of associations. The model should be able 
to study how different word/object relationships can evolve and also to define the 
mechanisms that explain the passage from communication based on simple stimulus 
association to languages based on grounded symbolic references. For this reason 
languages based on two-word combinations will be evolved. The behavioral task is 
influenced by ape language studies (Greenfield & Savage-Rumbaugh, 1990). 



2. Method 

The simulation method combines the use of artificial neural networks and genetic 
algorithms. It uses the methodology and theoretical framework of Ecological Neural 
Networks (Econet: Parisi, Cecconi & Nolfi, 1990). Populations of organisms are 
evolved according to their behavioral performance in foraging tasks. Organisms’ 
behavior is controlled by neural networks. 

In the present simulation, the environment setting for the foraging task consists of a 
2D grid of KXlxlOO cells. About 1200 cells are occupied by randomly placed foods 
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(mushrooms). The foods are grouped into two main functional categories: edible 
mushrooms (E), i.e. foods that need to be collected to increase organisms’ fitness, and 
toadstools (T), i.e. mushrooms that must be avoided. The first category of edible 
mushrooms is then split into three functional subcategories: white (el), yellow (e2), 
and gray (e3). These are called functional categories because they require organisms 
to perform a different task when approached (e.g. white mushrooms el should be 
picked and cut, whilst other colored mushrooms require different actions). The fitness 
formula adds one point for each el/e2/e3 mushroom that an organism approaches 
and properly treats according to its color. When a toadstool is collected, the fitness is 
decreased by one point. The toadstool category does not have any functional 
subcategory. Even though toadstools are perceptually classifiable into three categories 
(tl, t2, t3), these are not functional categories because the fitness formula removes 
one point for each toadstool that the organism reaches, regardless of their appearance. 

The organization of the foraging task stimuli into a hierarchy of functional 
categories was derived from the experimental setting of ape language studies. For 
example in Savage-Rumbaugh & Rumbaugh (1978) chimpanzees had to learn to use 
different lexigrams (graphic symbols in a keypad) to name solid foods (e.g. banana, 
orange) and drinks (coke, milk). Since they receive food from a vending-machine, 
they also need to learn a lexigram for the verb associated to solid foods (“give”) and 
that for the liquid drinks (“pour”). These stimuli constitute a hierarchy of two high- 
level functional categories (verbs) followed by four low-level categories (two foods 
and two drinks). In our model organisms will have to learn a name for each of the 
three edible subcategories (e.g. “white” for el, “yellow” for e2, and “gray” for e3), 
plus a common verb for the whole edible category, e.g. “approach”. All toadstools 
will require the use of a common verb, such as “avoid”. The three toadstool 
subcategories do not require a specific name to be identified, but organisms will be 
allowed to name them. 

The neural networks controlling the organisms’ behavior have a 3-layer 
feedforward architecture (see figure 2). The input layer has 29 units, organized into 
three groups of sensory units. In the first group there are 3 units, one for each of the 
40-degree neighboring visual field. The unit corresponding to the visual field in which 
the closest mushroom is perceived will be activated. Its activation value id the 
distance of this mushroom (range 0-1). The second group of input units has 18 nodes 
that encode some (visual) features of mushrooms. In fact, each mushroom has a set of 
18 binary features. The mushrooms of each subcategory share a common binary 
pattern. For example el mushrooms share the prototype m*************** 
and e2 have ***in************. An asterix (*) represents random bits. The 
third group of input nodes are localist language units. Each unit is activated whenever 
the corresponding word is used. The hidden layer has 5 units. The output layer has 
two groups of unit. The first 3 units encode the actions. Two binary bits control the 
movement (move one cell forward, turn left, turn right, stand still) and one unit for 
discriminating between el, e2, and e3 (activation a<.2, .2>a>.8, a>.8 respectively). 
The second group has 8 linguist units. These units are organized into two clusters of 
winner-takes-all units. Only two words at a time will be active (one per cluster). One 
cluster has 2 units, the second 6 units. The hidden and output units use the sigmoid 
activation function. 

Evolution is organized into two sequential stages. The first stage takes 300 
generations and organisms do not communicate at all. They only use the mushroom 




658 



position and feature information to evolve the proper foraging action. The population 
consists of 80 individuals. Organisms live in the same environment for 1000 actions 
(20 epochs of 50 moves each). The 20 organisms with the highest fitness level are 
selected and each reproduce making 5 offspring. The new organisms’ genotypes, i.e. 
the set of connection weights encoded as real numbers, are then mutated by slightly 
modifying 10% of the weights. 




Fig. 2. Neural network architecture and the interaction between the child organism and its 
parent. The parent uses words to describe the closest mushroom. In the Listening Task the child 
uses these words to decide which action to take. In the following Naming Task the child uses 
the parent’s words as teaching input for error backpropagation. 

In the second stage of evolution, starting at generation 301, communication 
between organisms is allowed. The 20 parent organisms are kept together with the 80 
new siblings. The parent organisms work only as speakers and language teachers. 
They cannot eat mushrooms and do not replicate. The need for two-stage simulation, 
in which only the foraging behavior is evolved at first, and subsequently 
communication is introduced, was suggested by a previous study (Parisi, Denaro & 
Cangelosi, 1998) in whieh the simultaneous evolution of foraging and language 
proved difficult. During each time interval, child organisms perform two actions 
(figure 2). The first is a Listening Task. To discriminate the type of mushroom 
children use the words suggested by their parents as input. In fact most of the time 
children rely solely on the parents’ linguistic input because they only perceive the 
mushroom’s features 10% of the time. After the network activation cycle of the 
Listening Task, children perform a Naming Task. They use the mushroom’s 18-bit 
features to name the food type. An error backpropagation algorithm is then applied. 
The error is computed using the parent’s words as teaching input, so that children 
learn the same linguistic description given by their parents. Some noise is added to the 
error between the child’s linguistic output and the parents’ teaching input. This is to 
allow variability in the process of cultural transmission of language (Parisi et al, 
1998). The same backpropagation algorithm is used for an Imitation task, where the 
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organism’s neural network learns the auto-association of the input-output linguistic 
stimulus. Organisms live for 2000 time steps, a longer lifetime than in the first stage 
as backpropagation learning requires more stimulus presentations. When the select 
organisms reproduce, their new offspring inherit the parent’s connection weights 
before backpropagation learning occurred. No Lamarkian inheritance of learned 
weights is allowed. Darwinian selection will continue evolving the ability to 
approach/avoid mushrooms. The language learning between parents and offspring 
permits cultural transmission between consecutive generations. 

During the second stage of evolution the interaction between parents and children 
can result in the emergence and auto-organization of a shared language. As they are 
only allowed to perceive the mushroom’s features 10% of the time, it should facilitate 
the evolution of a good language that discriminates at least the functional categories 
T, el, e2, e3. Fitness depends on the correct identification of these four categories. 



3. Results 



The first stage of evolution, which does not permit communication, was repeated 10 
times using different random populations. Nine of these replications resulted in an 
optimal classification behavior. Organisms evolved the ability to approach edible 
mushrooms E and avoid all toadstools T. Moreover, according to the type of edible 
mushrooms el, e2, e3, they produced the correct activation in the third node of the 
output action units. In the sole population where the evolved behavior was poor, 
organisms were unable to discriminate between e2 and e3. The average fitness for 
the 9 successful populations is shown in Figure 3 (first 300 generations). At 
generation 300 the best individual of each population on average collects 90 edible 
mushrooms (i.e. 4.5 mushrooms per each of the 20 epochs), and avoid all toadstools. 




Fig. 3. Fitness of the best individuals and of the groups of 20 selected parents. Simulation 
without communication (generations 1-300) and during the evolution of communication 
(generations 301-400). The values of generations 1-300 are averaged over 9 successful 
simulations. The fitness of generations 301-400 is averaged over 1 1 populations. 

For the second stage of evolution only the nine successful populations were used. 
The only simulation with unsuccessful fitness growth was not used because language 
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evolution requires a preliminary ability to discriminate behavioral categories (Parisi et 
al, 1998), This stage took 100 generations. For each population, two random starting 
conditions were executed. In total, 18 replications were performed. 

The results of the distribution of evolved languages are shown in Table 1 . In 1 1 of 
the 18 runs, populations evolved good languages, i.e, the use of at least four 
words/word-combinations to distinguish the four behavioral categories T, el, e2, e3. 
These languages emerged through a process of auto-organization of the lexicon, due 
to the interaction between organisms and the process of cultural transmission. The 
average fitness for the 1 1 successful populations is shown in figure 3 (generations 
301-400). In the remaining 7 populations the emerged language was poor. That is, 
some mushroom types were incorrectly labeled due to the lack of a specific symbol, 
or symbol combination. Therefore the fitness is very low since these mushrooms were 
incorrectly described and collected. 





Single word 


Word combination 


Verb-object 


TOTAL 


Good languages 


1 (9%) 


3 (27%) 


7 (64%) 


11 


Imperfect languages 


1 (14%) 


2 (29%) 


4 (57%) 


7 



Table 1. Distribution of language types in the simulations for the evolution of communication. 

In the previous section we explained that the linguistic output units are organized 
into two winner-takes-all clusters. The first cluster is made up of 6 linguistic units 
(words), and the second has 2 units. The cluster-based structure does not imply that a 
combination of two words is always necessary to describe a mushroom. In fact the 
optimal behavior requires the produetion of only four actions, and therefore four 
words from the first cluster are enough to name these categories. This is what 
happened for one of the good language populations. Here organisms used four words 
of the first cluster to name the four categories T, el, e2, e3. The two words of the 
second clusters were not systematically associated to any mushroom. 

When both clusters are used, there are several possibilities of combining words. 
However, we are interested in identifying word-combination rules that resemble 
known syntactical structures. In particular we want to establish if a verb-object rule 
has emerged. Considering the populations where good communieation evolved, ten 
(91%) evolved languages that use combinations of symbols. Among these, three 
populations (27%) use various combinations of two words, and seven (64%) use verb- 
object rules. The way we can identify the verb-object rule is because in the two-word 
cluster each linguistic unit is systematically associated only to one of the high-order 
categories T and E. One “verb” symbol is always used for all toadstools (“avoid”) and 
the other for all edible mushrooms (“approach”). The units in the 6-word cluster are 
used for distinguishing single “objects” (mushroom types) with which the two verbs 
systematically couple. 



4. Discussion 



The aim of this research was to develop a model of the evolution of communication 
systems based on simple syntactic rules, such as word combination. Moreover we 
were interested in establishing whether the evolved word-object relationships were 
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based on symbolic learning or mere object-word associations. The resultant 
description shows that languages that use combination of words (e.g. verb-object rule) 
can emerge by auto-organization and cultural transmission. During the first stage 
organisms forage using the 18-bit feature information to discriminate mushrooms. 
Throughout the second stage, the foraging strongly depends on the evolution of a 
useful language, as features are rarely available. This condition has resulted in the 
rapid emergence of a shared language. In fact, within 30 generations the organisms’ 
average fitness is the same as in generation 300, when the mushroom feature 
information was available all of the time. Moreover, the final fitness at generation 400 
is higher (99 for the best organism) than that at generation 300 (91). This could be due 
to the fact that for neural networks it is easier to process discrete information, such as 
localist linguistic input, rather then processing the 18-bit feature information. 

The percentage of evolved good language is 61%, as 11 out of 18 populations 
evolved useful languages. The remaining 7 populations (39%) evolved imperfect 
languages. However the discriminative quality of these languages was relatively 
good. In the majority of them only one of the four functional categories is incorrectly 
labeled. Two of the edible mushrooms categories are named by the same word/word- 
combination. Note that these imperfect languages also tend to use word combinations, 
and in particular most of them evolve the use of a verb-object rule. 

After having shown that it is possible to evolve by auto-organization 
communication systems based on the combinatory mle “verb-object”, we want to 
analyze the kind of referencing systems that organisms use when they associate words 
with objects. We are interested in establishing if the evolved languages are based on 
the use of grounded symbols, i.e. words that have a direct association with objects and 
that have logical relationships between them. We used a symbol acquisition test 
consisting of the training of organisms with a perfect combinatory language using the 
verb-object rule. The test was structured into three stages. In the first stage, organisms 
learn to name each of the four categories el, e2, tl, t2. The teaching input is not 
provided by the parent organisms, but directly from the researcher. At this stage verbs 
are not used, and no names are taught for the two categories e3 and 1 3. In the second 
stage, organisms learn to associate the two verbs “approach” and “avoid” with the 
categories el/e2 and tl/t2 respectively. It is now expected that organisms learn the 
logical relationship between the names of the two edible mushrooms el and e2 and 
the verb "approach". The same symbolic association between the verb “avoid” and the 
names of toadstools tl and t2 should be learned. In the final stage the learning of 
the names of categories e3 and t3 is finally introduced. The association the two 
verbs with these new names is not taught. In fact it is expected that after the training 
the organisms that learned real symbolic relationships between verbs and names will 
be able to generalize the verb-object relationship to the new mushroom names. If the 
verb “approach” is not associated with e3 it means that organisms did not learn any 
symbolic association between the names of el and e2 and the verb “approach”. They 
simply learned two independent object-word associations, one between el and its 
name, and another between the same el and the verb “approach”. 

This type of symbol acquisition test has been used in ape language studies. In the 
experiment where chimpanzees learned to associate “pour” with the name of the solid 
foods banana and orange (Savage-Rumbaugh & Rumbaugh, 1978), animals were then 
tested with new names of foods. Only those animals that made the correct 
generalization were considered to have learned symbolic associations. 
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The symbol acquisition test was repeated with 10 different populations. After the 
three learning stages, seven populations produced the correct associations e3- 
“approach” and 1 3 -“avoid”. The success criterion was the production of the correct 
verb for more than 75% of e3 and t3 mushroom types (N=8). In three populations 
the learning of the names for e3 and t3 did not produce the activation of the proper 
verb. It means that these organisms did not learn any symbolic association. In the 
seven successful populations, instead, the language is based on logical relationships 
between the mushrooms’ names and the two verbs. The relationships between words 
and real objects, and between verbs and objects’ name, allow neural networks to 
generalize the association of new names with the correct verb category. 

These results show that neural networks can learn simple languages that use 
symbolic associations. These symbols are grounded in the environment because of the 
ecological simulation framework that allows a direct link between words and the 
objects with which organisms interact. However, the network's simple feedforward 
architecture allows the use of other non-symbolic strategies for language learning. In 
fact, during some simulations organisms appear to learn languages that do not use 
symbolic relationships. More complex and biologically-inspired neural network 
architectures could steer the learning towards symbolic acquisition, rather than simple 
stimulus associations. As neurophysiological data suggests (Deacon, 1997) the 
cortico-cortical connections in the human brain could help explain why humans can 
easily learn symbolic associations, while animals tend to learn conditional stimulus 
associations (except in controlled experiments as shown in ape language studies). 
Deacon’s analysis of neuropsychological experiments on patients with prefrontal 
cortex lesions suggests that this area, and its connection with other cortical regions, 
could play a major role on symbol acquisition. The absence or underdevelopment of 
the prefrontal cortex in animals would subsequently explain the lack of symbol-based 
languages in animal communication systems. Our simple neural networks are not 
meant to represent any real neural systems. However, it is possible to design more 
articulate neural architectures that are inspired to specific connection patterns 
observed in the brain. Research is ongoing into understanding which particular neural 
network architectures allow symbol acquisition in language learning. 

The model described in this paper allowed the simulation of the evolution of 
languages using simple syntactic mles and symbol acquisition. To create a selective 
pressure for the auto-organization of useful languages the experimental condition 
made the foraging behavior highly dependent on the parents’ linguistic input (as 
mushroom features are only available 10% of the time). Moreover, the 
communication between parents and children, and the cultural transmission of 
language in the Naming Task, allowed the auto-organization of a population-shared 
language. The potential of this model for the study of the evolution of animal 
communication and human language is high. In future studies both the experimental 
setting, defining the availability of input mushroom features and of the linguistic 
input, and the model parameters controlling the pattern of communication and 
language learning between organisms, can be systematically changed to test specific 
hypothesis of the origin of language. For example, this model allowed us to focus on 
the important distinction between communication systems based on simple object- 
signal associations and languages based on symbolic relationships. The integration of 
the model with ongoing studies of the role of neural network architectures for the 
learning of symbolic representations will help to test Deacon’s (1997) hypothesis. It 
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will evaluate the role of symbol reference in language origin and on the co-evolution 
of brain structures, i.e. the prefrontal cortex, and language and symbolic acquisition. 
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Abstract: This paper shows that realistic and coherent vowel systems can 
emerge from scratch in a population of agents that imitate each other under hu- 
man-like constraints of production and perception. The simulation is extended 
so that populations can change; old agents can be removed, and new agents can 
be added. In these circumstances vowel systems can also emerge and be pre- 
served. It is shown that sometimes an age structure in the population can im- 
prove preservation of the vowel systems. 



1 Introduction 

The study of human languages is not only concerned with describing the huge variety 
of phenomena that are encountered in human languages, but also with finding expla- 
nations for similar phenomena that recur in languages that are neither related histori- 
cally nor geographically. Such phenomena are called language universals, or, because 
there are always exceptions, universals tendencies. This paper is concerned with ex- 
plaining the universal tendencies of human vowel systems. Although the human vocal 
tract is capable of producing a wide variety of different vowels (at least 45 different 
basic vowel qualities [10]) one finds that all human languages use only a very small 
subset of these. Furthermore, these subsets are not chosen randomly, but they exhibit 
remarkable regularities. When one looks at the vowel systems of widely different 
languages, such as, for example the 451 languages in the UPSID (UCLA 
Phonological Segment Inventory Database [12, 13]) one finds that the maximum 
number of vowel qualities in any language is 15 in Norwegian [10] while the mini- 
mum munber is 3 (although there are languages that are reported to have only two 
vowels [10]). One also finds that certain vowels, for example [i], [a] and [u] occur 
very frequently (in 87%, 87% and 82% of the languages in WSID, respectively) 
while otWs, such as [y], [oe] and [ 0 ] occur only rarely (in 5%, 2% and 3% of the 
languages, respectively). But one also finds that the structure of vowel systems tends 
to be regular. For example, if a language contains a front, umounded vowel of a cer- 
tain height, for example [e] (which occurs in 41% of the languages in UPSID) one 
also tends to find the rounded back vowel of the corresponding height, in this case [ 0 ], 
which occurs in 36% of the languages, but in 73% of the languages that also contain 
an [e]. Many such regularities can be found and typologies of vowel systems have 
been based on them [2, 16]. 
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A possible explanation of these phenomena is that they occur because humans have 
an innate, cognitive disposition towards using certain speech sounds and combina- 
tions of speech sounds. Innate distinctive features, rules and markedness constraints 
are proposed. Distinctive features are properties (often binary) of speech sounds that 
are used to distinguish them from each other. Rules determine in what way abstract 
distinctive features can be combined into phonemes (the basic abstract speech sounds 
that can distinguish meanings of words), and how the phonemes are transformed into 
an actual speech signal. Some features are more marked than others, meaning that 
they are less likely to occur. There are a number of problems with this theoretical 
framework, but the most important of these is that it does not actually explain any- 
thing at all. The features, rules and markedness constraints are inferred from observa- 
tions of human sound systems. It would thus be circular to use them to explain the 
very phenomena from which they have been inferred. 

An independent explanation of the structure of vowel systems can be derived from 
functional criteria, such as acoustic distinctiveness and articulatory ease. Lindblom 
and Lilencrants [11] have shown that optimization of acoustic distance between the 
members of a vowel repertoire results in predictions of the most frequently occurring 
vowel systems in human languages. Subsequent improvements on their model (see 
e.g. Schwartz et al. [15]) have confirmed their results. There are two problems, how- 
ever. The first problem is that the number of vowels in the system has to be fixed 
beforehand. The second problem is that the model depends on explicit optimization. It 
shows that human vowel systems are in general optimized for acoustic distinctive- 
ness, but it does not give an explanation of how the optimization takes place in human 
languages. It is clear that human language users do not optimize the sound system of 
their language explicitly. In fact, when learning a language, children imitate their 
parents as accurately as possible, more accurately in fact than strictly necessary for 
successful communication. This can be observed from the fact that children do not 
only learn the language of their parents, but also their dialect. Although people 
speaking different dialects can often understand each other perfectly, they are never- 
theless aware of subtle distinctions in pronunciation. The optimization model is in- 
complete. It explains why vowel systems are the way they are — they are optimized — 
but not how they have become optimized. 

The work described in this paper is based on the theory that self-organization in a 
population of language users drives sound systems towards optimality. It is based on 
the theories of Luc Steels of language as a complex dynamic system [17, 18, 19] and 
is also related to other work on the origins and the evolution of language [7, 9]. In the 
theories of Luc Steels language is considered as much to be a phenomenon of a 
population as it is knowledge of individuals. This approach does not consider lan- 
guage in terms of abstract ideal knowledge of an individual, nor does it deal with 
idealized speaker-hearer interactions. Rather, it stresses the fact that language is an 
open, distributed system, where no member of the population has complete knowl- 
edge of the language, let alone control, where interactions can be messy and incom- 
plete, where new words, meanings and constructions can enter the language, and 
where individual speakers of the language can enter and leave the population. Coher- 
ence in the language is maintained through self-organization. Change, which is 
viewed as an inherent property of language, is driven by speech errors, by contradict- 
ing drives to minimize articulatory effort and to maximize communicative success 
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and by conscious creation of new words and expressions. As language is seen as a 
complex d 5 mamic system, the only way to explore it is with computer simulations. 

In the work described in this paper a population of agents that can produce, per- 
ceive and learn speech sounds in a human-like way is modeled. The agent’s goal in 
life is to imitate the other agents in the population as well as possible. The work pre- 
sented here is not the first work that tries to explain the universal tendencies of hiunan 
vowel systems through the interactions of agents in a population. First Glotin [2] and 
later Berrah [1] have both built computer simulations of populations of agents that 
imitate each other’s speech sounds. However, in their simulations, the number of 
vowels in each agent is fixed beforehand and the main process that shapes the vowel 
systems is still explicit optimization. The interactions between the agents only serve 
to make the vowel systems in the population coherent, not optimized. 

In previous papers, [3, 4] it has been shown that the proposed model really results 
in coherent and realistic vowel systems. Meanwhile the research has progressed to 
larger populations and more realistic results (compare, for example the figures in [3] 
with the figures in this paper.) Many (extremely boring) experiments have been done 
to test the sensitivity of the system to parameter changes and small changes of algo- 
rithm, but the self-organisation has been shown to be remarkably robust. The inter- 
ested reader is referred to [5]. The main focus of this paper will be on investigating 
the results of population dynamics. Human languages are spoken in open populations. 
Old speakers of the language can die and children have to learn the language. With 
the computer simulations it can be investigated what happens in the case of a chang- 
ing population. This is a test for the model: will the agents preserve vowel systems, 
just as happens in human language, and will it still be possible to have vowel systems 
emerge from scratch? Also socio-linguistic hypotheses that are hard to test in reality 
can be investigated with this model: what happens with different population replace- 
ment rates and is there an advantage in having speakers of different ages learning with 
different rates? These questions are treated in the section 3. 

2 The Simulation 

The simulation models the interactions of agents in a population. Each agent is 
equipped with a realistic articulatory synthesizer for producing vowels, a model of 
human perception of speech signals for calculating the distance between different 
signals and an associative memory in which they can store the associations between 
articulatory prototypes, acoustic prototypes, the number of times the prototypes have 
been used and the number of times they have been successfully used. Whenever an 
agent hears an acoustic signal, it calculates the distance between this signal and all its 
acoustic prototypes and considers the prototype that is closest to the signal as the one 
that it has recognized. Vowel signals are represented as the frequencies of the first 
four peaks (formants) of the acoustic spectrum of the vowel. Whenever an agent 
wants to produce a vowel, it takes the vowel’s articulatory prototype and synthesizes 
its formants with the articulatory synthesizer. Some noise is added (by shifting the 
formant frequencies randomly) so that an articulatory prototype is never exactly real- 
ized the same way acoustically. The distance between a signal and the acoustic pro- 
totypes of vowels is calculated in a two-dimensional space with the logarithm of the 
frequency of the first formant as one dimension and the logarithm of the frequency of 
the effective second formant as the second dimension. The distance is calculated as a 
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weighted Euclidean distance where the effective second formant frequency is 
weighted as 30% of the first formant frequency. 

The effective second formant frequency is a non-linearly weighted sum of the sec- 
ond, third and fourth formant frequencies. It is based on the observation that due to 
the limited resolution of human perception of high frequencies, signals that have 
multiple peaks at high frequencies are perceived as identical to signals that have only 
one peak at high frequencies. The position of this single peak is a function of the 
position of the peaks in the original signal. In this paper it is calculated by a function 
that is based on Mantakas et al.'s [14] weighting function. It would have been better 
to calculate the distance in the original 4-dimensional acoustic space, but all available 
models of human vowel perception were based on the effective second formant. 

The interactions between the agents have been called imitation games. For each 
imitation game, two agents are selected from the population at random. One of the 
agents will play the role of initiator, the other will play the role of imitator. Basically, 
the initiator selects an acoustic vowel prototype form its repertoire at random and 
synthesizes the acoustic signal for this vowel. The imitator listens to this signal and 
finds its acoustic prototype that is closest to it. It then synthesizes the articulatory 
prototype that is associated with this acoustic prototype. The initiator listens to this 
signal and finds its closest acoustic prototype. If this is associated with the articula- 
tory prototype it originally produced, the imitation game is successful. If not, it is a 
failure. This information is communicated non-verbally to the imitator. 

In reaction to the imitation game, both the initiator and the imitator update their 
repertoire of vowels. The use and success (if the game was successful) counts of the 
vowel prototypes that were produeed are increased. Whenever the success/use ratio of 
a vowel drops below a certain value, (0.7 throughout the paper) and its use count is 
sufficiently high (5 throughout the paper) it is removed from an agent’s repertoire. 
The imitator can make subsequent changes to its repertoire. If the imitation game was 
successful, the imitator shifts the articulatory prototype it used so that its acoustic 
realization matches the signal that was heard more closely. This is done in order to 
increase the coherence in the population. If the imitation game was a failure, and if 
the vowel that the imitator used was successful, the imitator adds a new vowel to its 
repertoire that closely matches the signal that was heard. The articulatory prototype of 
this vowel is determined by a hill-climbing heuristic: the agent makes a first guess, 
produces it, listens to itself, makes small articulatory adjustments and iteratively im- 
proves its first guess. The vowel is added because Ae other agent probably had two 
vowel prototypes at a location where this agent had only one, thus causing confusion. 
If the success/use ratio of the vowel that was used is low, however, it is moved closer 
to the perceived signal in an attempt to improve it. Finally, both the imitator and the 
initiator merge vowels that come too close together in acoustic or articulatory space. 

When these interactions are iterated a large number of times in the population, re- 
alistic vowel systems emerge. The process of emergence is illustrated in figure 1. In 
this figme, the aeoustic vowel prototypes of all the agents in a population of twenty 
agents are plotted in the acoustic space formed by the first formant (on the vertical 
axis) and the effective second formant (on the horizontal axis) in the logarithmic Bark 
frequency scale. Note that the scales on the axes increase from top to bottom and from 
right to left, respectively. This has been done in order to get the vowel prototypes in a 
configuration that corresponds to the way phoneticians usually plot vowels, with front 
vowels towards the left and high vowels towards the top of the graph. Note that due to 
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articulatory limitations, the acoustic space that can be reached by the agents is limited 
to a triangular region with the apex at the bottom of the graph. Whenever there is a 
cluster in the graph, this means that most agents in the population have an acoustic 
prototype near this position. The number of clusters in the graph indicates the number 
of vowel prototypes of each agent. 

The frames in figure 1 show the vowel systems in the population after 50, 500, 
2500 and 15 000 imitation games from left to right. It can be seen that initially, the 
vowel prototypes are spread randomly through the available acoustic space. After 500 
imitation games, some structure becomes visible. Vowel prototypes cluster together 
and multiple clusters start to form. After 2500 imitation games, the clusters are dis- 
persed more evenly throughout the available acoustic space and after 15 000 imitation 
games, the clusters have become compact and more or less evenly dispersed. When 
this stage has been reached, the agents’ vowel systems remain relatively stable. Al- 
though they never stop changing completely, the possible changes are relatively 
small, consisting mostly of small shifts of the clusters and the occasional merging of 
existing clusters or emergence of a new cluster if space is available. The acoustic 
noise setting for this graph was 10%. 

In order to get an idea of the realism of the emerging vowel systems, several meas- 
ures can be defined, for example the energy of the vowel systems as defined by Lil- 
jencrants and Lindblom [11], the average success of imitation or the average number 
of vowels per agent. These will be used later in this paper. However, direct inspection 
and classification of the agents’ emerging vowel systems is also a good way to get an 
idea of the realism of the emerging systems. An example of the classification of 
emerging vowel systems with five vowels is given in figure 2. This figure was formed 
by doing 25 000 imitation games in 100 populations of 20 agents. The acoustic noise 
was set to 15%. Of the emerging vowel systems, 49 consisted of five vowels (the 
number of vowels in the final vowel systems is not determined beforehand). From 
each of these populations, one random agent was selected. The vowel systems of 
these agents were classified according to the relative position of the vowels in the 
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system in the same way as Crothers [2] has classified vowel systems of human lan- 
guages. It can be seen that the different types occurred (from left to right) in 88%, 8% 
and 4% of the cases. When the results are compared with those of human languages 
one finds a very good match. Schwartz et al. [15] found that of the languages in UP- 
SID, 89% were of the leftmost type, and the middle and rightmost type both occurred 
in 5% of the cases. Equally good matches were found for other numbers of vowels, 
although there were some problems for very low and very high numbers of vowels 
(see [5, chapter 6] for more details). 

Apparently the model makes realistic and coherent vowel systems emerge in a 
population by no other process than local interactions between the agents under ar- 
ticulatory and acoustic constraints. 



3 An Open Population 

In the real world, populations of language users are not static. People can die, and 
children get bom. Old speakers continually leave the population and new speakers 
enter it. This usually does not disrupt the language. Research on the social dynamics 
of language using traditional methodologies is extremely difficult, as the factors in- 
fluencing language change are very hard to identify, let alone that they can be con- 
trolled. In the artificial setting of the imitation, game, carefully controlled experiments 
with changing populations can be performed. From this more insight can perhaps be 
obtained in the social dynamics of real language. A number of experiments are pre- 
sented that give an idea of the possibilities of the application of computer simulations 
to the study of sound change. Other parts of language, most notably the emergence of 
lexicon and semantics, have already been studied by other researchers (see e.g. [20]). 

The population is changed by removing and adding agents at random. Making this 
process random was deemed to be the most realistic (as in human populations birth 
and death are basically also random processes) and the least likely to introduce arti- 
facts due to regularities in the replacement. In every imitation game in which it par- 
ticipates, an agent has a certain probability of being removed. Also, in every imitation 
game there is a small probability of an agent being added to the population. Of course, 
the newly added agent’s vowel repertoire is still empty. It will have to learn the sound 
system of the population to which it is added. The probabilities of adding new agents 
and removing old ones will have to be equal so that the number of agents in the 
population remains relatively stable. However, depending on the value of the prob- 
abilities, the rate of replacement in the populations can be high or low. 

A comparison between different rates of replacement is shown in figure 3. The 
leftmost frame of this figure shows the original vowel system that was used as the 
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Starting configuration for all other experiments presented in this section. This vowel 
system was obtained from running 25 000 imitation games in a population of 50 
agents with an acoustic noise of 10%. The second graph from the left shows the 
vowel system of a population of agents in which the probability of adding or remov- 
ing agents was 0.1. It shows the situation after 2500 imitation games. With these 
parameter settings, it was unlikely that any agent from the original population still 
remained in the final population. The third graph from the left shows a vowel system 
from a population with a probability of adding or removing agents of 0.01. As it is 
unfair to compare populations that have undergone unequal numbers of replacements, 
this population’s vowel system is shown after 25 000 imitation games. The rightmost 
frame shows the vowel systems of a population with a replacement probability of 
0.001 after 250 000 imitation games. From these graphs it is clear that the higher the 
replacement rate, the worse the agents are able to preserve the original vowel system. 
This has nothing to do with the absolute number of replacements. These are approxi- 
mately equal for all graphs shown. Rather, the time in which agents stay in the popu- 
lation (and therefore the time they have to learn the vowel system) determines how 
accurately a vowel system is preserved. This does not mean that the systems with a 
high replacement probability do not become stable. After a while they settle in a state 
in which they contain fewer prototypes and the prototype clusters are more dispersed. 

In fact, coherent and realistic vowel systems can even emerge from scratch in an 
open population. This is illustrated in figure 4. It shows a similar emergence of a 
vowel system as figure 1. The replacement probability was 0.01 and the acoustic 
noise was 10%. The emergence is slightly slower than in figure 1, but this is due to 
the fact that the population is larger — ^50 agents instead of 20. There are fewer vowel 
prototypes per agents and the prototype clusters are slightly bigger. Nevertheless, a 
coherent, successful and realistic vowel system emerges even though at the end of the 
run, no agent that was present at the beginning is present anymore. 

In the experiments shown so far, there is no cost for articulatory effort. If an agent 
wants to imitate a signal it has heard, it can talk and listen to itself an unlimited num- 
ber of times in order to perfect its vowel prototypes. In reality, such articulatory effort 
probably has a certain cost. Therefore an experiment was done in which the number 
of times an agent can talk to itself for improving a new sound is limited to 10. The 
interesting finding of this experiment was that it then becomes advantageous to intro- 
duce an age structure in the population, so that younger agents can change their ar- 
ticulatory vowel prototypes more quickly than older agents. The older agents thus 
provide a stable target to which the younger agents can move their vowels. The results 
are shown graphically in figure 5. The original vowel systems are shown in gray, the 
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final vowel systems, obtained after 15 000 imitation games are shown in black. In the 
leftmost and rightmost frames, the populations do not have an age structure. The 
difference between the two figures is that the step size with which the articulatory 
vowel prototypes were improved was 0.01 in the leftmost and 0.03 in the rightmost 
frame (0.03 was the value that was used in all other experiments presented so far). In 
the two middle frames, yoimg agents used a step size of 0.03 and old agents tended 
towards step size of 0.03. The agents’ step sizes changed in the following way: 

e, <- + a(0.01 - £,_j ) (1) 

Where £, is the step size at time t and a is a parameter that determines the speed with 
which the agents change their step sizes. The value of a was 0.1 for the left center 
frame and 0.01 for the right center frame. The difference in performance between the 
different systems in the figure are a bit hard to judge, therefore table 1 gives the 
measures of the system. The success is the average over the communicative success 
of the agents in the population. The energy of the vowel systems is calculated ac- 
cording to [1 1] as the sum over the reciprocal of the squared distance between all the 
vowels in an agent’s vowel system. The size is the average number of vowels in the 
agents’ vowel systems and the similarity is the communicative success that can be 
achieved between an agent that has the original vowel system and an agent that has 
the vowel system that emerges at the end of the simulation run. 



Population: 


£„ = 0.01 
£x. = 0.01 

a =0 


£„ = 0.03 
£>0 = 0.01 
a = 0.1 


£„ = 0.03 
£.0 = 0.01 
a = 0.01 


£;, = 0.03 
£.0 = 0.03 
a = 0 


Success: 


0.8531 ±0.040 


0.7701 ± 0.035 


0.7930 ± 0.041 


0.8041 ± 0.041 


Energy: 


4.04 ± 0.59 


5.55 ± 1.10 


5.10 ±0.95 


3.83 ± 0.72 


Size: 


4.77 ± 0.40 


5.66 ± 0.63 


5.61 ±0.58 


5.14 ±0.50 


Similarity: 


0.7347 ± 0.023 


0.8231 ± 0.028 


0.8235 ± 0.032 


0.7884 ± 0.032 



Table 1: Measures of the populations with age structure. 

The statistics were obtained from 100 runs of each system. Shown are the averages 
for every measure and their standard deviations. Using the Kolmogorov-Smimov test 
it can be shown that the vowel system sizes and the similarity measures in the popu- 
lations with age structure are significantly higher (at the 1% level) than in the popula- 
tions without age structure, indicating that the original vowel system is preserved 
better. The fact that the success is lower for the two systems with age structure is not 
important, as success always tends to be a little lower for larger systems. 
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These experiments show that the fact that old agents learn less well than young 
agents can sometimes be beneficial. 

4 Conclusion and Future Work 

The experiments presented here have shown that a) coherent and realistic vowel sys- 
tems can emerge from scratch in a population of agents that imitate each other under 
constraints of production and perception and b) that the same simulation can be used 
to investigate the consequences of population dynamics on the emergence and the 
preservation of sound systems. The first result is of direct relevance for phonetics. It 
shows that the agents do not need innate predisposition towards certain structures in 
order for these structures to emerge. Traditionally it has been assumed that the fact 
that certain phonetic and phonological structures are found in unrelated human lan- 
guages is proof for the theory that these structures are innate. This research shows that 
they can as well be the result of self-organization in a population. The second result is 
not directly relevant for linguistics, but it does show that the methodology can be used 
successfully for investigating phenomena as complex as dynamics of populations of 
language users. It incidentally also shows that the self-organizing emergence is ex- 
tremely robust and happens even if the population is changing. Finally, the results of 
the simulations with age structure show that the fact that older language users learn 
less quickly than younger ones might not just be an unfortunate consequence of aging, 
but might help to preserve sound systems (and possibly other aspects of language) 
across the generations. 

Of course, the simulation is still very simple. From the point of view of phonetics, 
one of the main things that needs to be investigated is whether it also works with 
more complex sounds. If one wants to investigate sound change seriously, one needs 
to take interactions between different speech sounds into account. Also, most of the 
more interesting universal tendencies of human sound systems have to do with conso- 
nants and combinations of sounds. 

From the point of view of population dynamics, more complex experiments with 
spatial distribution of agents, or with populations with different sound systems that 
come into contact with each other can be conceived. Some of these experiments have 
already been done for other aspects of language (see e.g. [20]). 

In any case, the use of computer simulations of populations of agents that learn and 
use realistic speech sounds Iws proven to yield interesting results. The experiments 
done thus far are only a beginning, but they show that the technique is promising and 
can help to shed new light on the complexities of human language. 
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Abstract. This paper addresses the emergence of a common phonetic 
code in a society of communicating speech agents using evolutionary 
techniques. Predictions for the large vowel systems of the world’s lan- 
guages using the Maximum Use of Available distinctive Features (MUAF) 
principle are discussed. Simulations of the use of supplementary phonetic 
features in large vowel systems Me presented. These experimental results 
show how simple local rules of interaction between speaking agents are 
sufficient to explain some of the universal characteristics of the phono- 
logical structure of the world’s languages. 



1 Introduction 

Whatever the language considered, speech production can be considered as a suc- 
cession of closing and opening gestures of the vocal tract, i.e. syllables made up 
of consonants C and vowels V respectively. Maddieson and Precoda [9] gathered 
phonological descriptions of a representative sample of the world’s languages. 
They obtained an inventory consisting of 920 possible phonemes including 654 
consonants, 177 vowels and 89 diphthongs. One of the conclusions of this work 
is that a majority of languages use five vowels and, roughly, twenty consonants. 
Although the number of combinations Cfyy for V and CI 54 for C are very high, 
typologies carried out [ 8 , 13] show that phonological systems do not have an 
arbitrary structure. 

To explain these regularities, the generativist linguists estimate that the uni- 
versal tendencies are axiomatically imposed. For instance, Chomsky [2] thinks 
that all humans are provided at birth with the capability to recognise a common 
set of distinctive features. During the first years of language acquisition, this set 
is filtered in order to keep only the distinctive features of the mother tongue. 

We propose however a radically different point of view, which posits that 
sound system universals emerge from the ba.sic characteristics of the speech 
production and perception mechanisms, as well as from the speaker/listener 
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interaction (cf. [7]). According to this approach, favoured sound systems are 
those that allow the best accomplishment of the communication task taking into 
account the constraints of the speech production and perception systems. 

Along these lines, several studies have addressed the universal tendencies of 
the world’s languages, resulting in a series of prediction models [6, 11]. These 
models allow, on the one hand, the prediction of the structure of phonological 
systems, and on the other hand the general explanation of why certain sounds 
imposed themselves in the world’s languages. However, neither the evolution- 
ary processes are addressed, nor the aspects related to the interaction between 
speakers that are at the origin of these universal tendencies. 

The central interest of this paper is the study of prediction models of sound 
systems, in particular those that determine the emergence of phonological in- 
ventories from principles of interaction between communicating agents. This ap- 
proach is innovative and relates to other recent work such as of Steels [12], within 
the framework of Artificial Life. Our goal is twofold: first, to model the formation 
of phonetic inventories by investigating the basic mechanisms of the interaction 
between speakers and listeners and, second, to relate those mechanisms to the 
emergence of sound systems. 

The “SPEech Communicating agEnt Society” (SPECIES) model of vowel 
perceptual exchanges in a society of speech agents is described. In the presented 
simulations, we explore the role of the principle of “Maximum Use of Available 
distinctive Features” (MUAF) in structuring phonetic inventories. This principle 
is invoked to explain why languages with crowded vowel inventories tend to use 
additional phonetic features. 

2 Description of the Emergence Model 

Speech communication can be seen as a sequence of perceptual objectives “ne- 
gotiated” between a speaker and a listener. An important question immediately 
raises its head: how are the perceptual objectives negotiated? The answer to this 
question lies in a deep comprehension of the physiological and cognitive mecha- 
nisms involved in speaker-listener interactions. Of course, this is a overwhelming 
task. In the simulations presented here, we implemented a simplified version of 
local interactions between speech agents, who can communicate using vowel-like, 
static sounds. 

The main aim is to make a “society” of speech agents that can adapt their 
productions through local interactions, called hereafter transactions. For each 
simulation, the number of distinct phonetic items (i.e. the size of the lexicon) is 
fixed in advance. This is in contrast with other similar recent studies, in which 
the number of items can vary during the simulation of a society of communicating 
agents [3]. 

Speech agents in our model are simple agents which can produce sustained 
vowels, represented in a perceptual space, which is composed of subspaces, each 
one representing a particular vocalic feature. Although some of these subspaces 
are unidimensional (adequately describing vowel features like duration or nasali- 
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sation), we will always consider one of the subspaces as a bidimensional formant 
space. Formants are the frequency values at which maxima of the acoustic spec- 
trum occur. It is well known that vowels are well characterised by the first two 
formants (Fi and F 2 ) alone [4], which justifies the choice made in the present 
study. In a former version of our model [5], the articulatory level was also in- 
cluded. We decided not to include it in the present work in order to thoroughly 
understand both the mechanisms and the implications of the perceptual level 
alone. 

Each agent in the society has a lexicon composed of a fixed number of items 
initialised randomly. Pairs of agents are randomly chosen to communicate using 
an item of their lexica. We define an interaction between two agents as a trans- 
action, in which one of the agents takes the place of the speaker and the other 
that of the listener. The speaker chooses randomly one of its items and produces 
it. The listener relates the features of the signal sent by the speaker to its own 
lexicon at the perceptual level. Finally, the listener adjusts its items in order to 
adapt its lexicon to this new sensed item. 

The listener adaptation to a new sensed item constitutes the important aspect 
of the simulations. Indeed, lexicon convergence, the key to our problem, depends 
in a great part on this adaptation. We assume that a person who learns a new 
word (or sound) will try to first repeat this word (or sound). So, a speech agent, 
which perceives an unknown item, will try to repeat it. This is achieved by 
moving in the formant space of this agent the nearest item towards the perceived 
item {attraction), and moving the others away in order to avoid confusion risks, 
and to maximise the perception distinction (repulsion). The deteuls are published 
in [1]. 

3 Experimental Validation 

Despite the striking simplicity of our model, the simulations replicate well-known 
phenomena observed in phonetic inventories of the world’s languages. We will 
present only the simulation results of predictions for large vowel systems. Simu- 
lation results of predictions for vowel systems with small numbers of items are 
published in [1]. 

In this set of simulations, we explore what Ohala called the principle of 
“Maximum Use of Available distinctive Features” (MUAF), which plays a role 
in the structuring of phonetic inventories [10]. The MUAF principle is invoked 
to explain why languages with crowded vowel inventories tend to use additional 
features, like nasalisation and duration, to preserve distinctiveness. 

We propose a modified version of our emergent model to reproduce the 
MUAF principle in vowel systems. The goal is to incorporate the fact that once 
a feature is used, it is explored systematically before going to the next. The fea- 
tures that provide greater gain in perceptual distinctiveness are explored first. 
The repulsion of different items in the listener lexicon is done using only one of 
the two features at a time (namely that for which the gain in distinctiveness is 
greater). 




677 



In order to investigate this idea, an experiment with vowels was conducted. 
We proposed the addition of a perceptual feature, which makes the perceptual 
space three-dimensional (two for the formants and one for the additional feature). 
Fig. 1 shows the results. Right panels represent projections of the 3-D perceptual 
space (left panels) into the F 1 -F 2 plan. Fig. l(a.l) and Fig. l(a.2) show a 5-vowel 
system that is /i,e,a,o,u/ the most widespread system in the UPSID database 
of the world’s languages [8]. It is possible to see that the new feature is not 
systematically explored in Fig. l(a.2), contrary to the case shown in Fig. 1(b) 
which represents a 9- vowel system. One could say that this last system is also a 
variant of /i,e,a,o,u/, but the interesting point is that, as shown in Fig. l(b.l), 
vowels tend to use the additional feature and that is quite clear for /i/, /a/, /e/ 
and /u/. The same observation can be done with the 12- vowel system represented 
in Fig. 1(c) where an additional feature is recruited for 5 vowels. 



( 1 ) 



( 2 ) 




(3) 



9 vowels 


12 vowels 


type 


» 


type 


« 


9+0 


17 


7+5 


8 


5+4 
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6+6 
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6+3 


2 


12+0 


2 


7+2 


2 


8+4 


1 


8+1 


1 


9+3 


1 



F2 



Fig. 1. Final lexica for simulations in an extended perceptual space, composed of the 
formant space and an additional feature, (a) 5 vowels, (b) 9 vowels and (c) 12 vowels. 
The table in (3) summarizes the statistics for the vowel system types with 9 and 12 
vowels present in UPSID. n+m means a system with n basic vocalic timbers and m 
modified (nasalized, longer, etc) vowels. See [1] for details. 



These simulation results fit with the data on the world’s languages. Indeed, 
the 9- vowel system shown in Fig. 1(b) is observed in four languages, while 50% 
of the sixteen 12-vowel system indexed in UPSID use an additional feature for 
5 vowels like the system shown in Fig. 1(c) [8]. 




678 



4 Conclusions and Further Developments 

The SPECIES model is a platform for exchanges of vowels within a community 
of speech agents. The attainment of a common vowel lexicon shared by all the 
agents is a consequence of the cooperative aspects inherent in the exchanges 
between the agents. Certainly, the account for the speech interaction in the 
model is still quite crude (this is a work in progress), but our simulations provide 
a framework for more realistic studies, which would not be possible with other 
approaches, like [11]. Simulation experiments related to the prediction of vowel 
systems were conducted. The predictions obtained are in accordance with the 
statistics of the world’s languages. In spite of its predictive power, this model is 
not yet fully realistic. In the future, an articulatory model of speech production, 
as well as more complex linguistic units like syllabic consonant-vowel transitions 
may be incorporated into SPECIES. Finally, we think that the above results on 
vowels universals support the hypothesis that language is a phenomenon which is 
created actively by speakers and listeners, evolving under functional constraints, 
like exchanges and cooperations based on perceptual distinctiveness. 
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Abstract. We report on a case study in the emergence of a lexicon in 
a group of autonomous distributed agents situated and groimded in an 
open environment. Because the agents £ire autonomous, grounded, and 
situated, the possible words and possible meanings are not fixed but 
continuously change as the agents autonomously evolve their communi- 
cation system and adapt it to novel situations. The case study shows that 
a complex semiotic dynamics unfolds cind that generalisations present in 
the language are due to processes outside the agent. 



1 Introduction 

In recent years it has become clear that the complex adaptive systems approach 
pioneered by Artificial Life research can fruitfully be applied to the study of the 
origins and evolution of language [9], particularly to the emergence of shared 
sound systems [3], the self-organisation of lexicons [7], [11], grounded word mean- 
ing [12], and the origins of grammar [4], [1], [5]. In all this research, the same 
mechanisms for the generation and maintenance of complexity are being used as 
exploited in other Artificial Life research, and a similar complex dynamics can 
be seen to emerge. 

This paper focuses on grounded lexicons a.s they emerge from the local in- 
teractions of a group of distributed autonomous robotic agents, grounded in a 
real world physical environment through visual sensing. Consequently, and in 
contrast with other work so far, the meanings of words are no longer given as 
symbols supplied by a designer, nor is it assumed that hearers have perfect 
knowledge of what meaning is intended by a speaker. Rather the agents must 
autonomously infer the possible meanings of unknown words from their visual 
interpretation of the situations they encounter. Agents never get immediate feed- 
back on whether they had the right meaning, only whether the communication 
was successful. Grounding introduces two additional difficulties: The ontology 
must be sufficiently robust to handle variations in the data. Anomalies in percep- 
tion come from differences in the perception (for example one agent segmenting 
into different objects than the other). This causes the rise of additional ambigu- 
ities which need to be damped by the semiotic dynamics. We show that under 
these conditions, which are more realistic with respect to human natural lan- 
guage acquisition and use, a very complex semiotic dynamics is generated which 
nevertheless manages to self-organise into a successful communication system. 




680 



2 The Talking Heads Experiment 

The robotic setup used for the experiments in this paper consists of a set of 
‘Talking Heads’ connected through the Internet. Each Talking Head features 
a Sony EVI-D31 camera with controllable pan/tilt motors for horizontal and 
vertical movement (figure 1), a computer for cognitive processing (perception, 
categorisation, lexicon lookup, etc.), a screen on which the internal states of 
the agent currently loaded in the body are shown, a TV-monitor showing the 
scene as seen through the camera, and devices for audio in- and output. Agents 
can load themselves in a physical Talking Head and teleport themselves to an- 
other Head by travelling through the Internet. By design, an agent can only 
interact with another one when it is physically instantiated in a body located 
in a shared physical environment. The experimental infrastructure also features 
a commentator which reports and comments on dialogs, displays measures of 
the ontologies and languages of the agents and game statistics, such as average 
communicative success, lexical coherence, average ontology and lexicon size, etc. 
For the experiments reported in this paper, the shared environment consists of 
a magnetic white board on which various shapes are pasted: colored triangles, 
circles, rectangles, etc. 




Fig. 1. Two Talking Head cameras and associated monitors showing what each camera 
perceives. 



The guessing game The interaction between the agents consists of a lan- 
guage game, called the guessing game. The guessing game is played between two 
visually grounded agents. One agent plays the role of speaker and the other one 
then plays the role of hearer. Agents take turns playing games so all of them 
develop the capacity to be speaker or hearer. Agents are capable of segmenting 
the image perceived through the camera into objects and of collecting various 
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sensory data about each object, such as the color (decomposed in RGB chan- 
nels), average gray-scale or position. The set of objects and their data constitute 
a context. The speaker chooses one object from this context, further called the 
topic. The other objects form the background. The speaker then gives a verbal 
hint to the hearer. 

The verbal hint is an utterance that identifies the topic with respect to the 
objects in the background. For example, if the context contains [1] a red square, 
[2] a blue triangle, and [3] a green circle, then the speaker may say something 
like ’’the red one” to communicate that [1] is the topic. If the context contains 
also a red triangle, he has to be more precise and say something like ’’the red 
square” . Of course, the Talking Heads do not say ’’the red square” but use their 
own language and concepts which are never going to be the same as those used in 
English. For example, they may say ’’malewina” to mean [UPPER EXTREME- 
LEFT LOW-REDNESS]. The verbal hint is in this experiment assumed to be 
transmitted completely accurately. 

Based on the verbal hint, the hearer tries to guess what topic the speaker 
has chosen, and he communicates his choice to the speaker by pointing to the 
object. A robot points by transmitting in which direction he is looking. The 
game succeeds if the topic guessed by the hearer is equal to the topic chosen by 
the speaker. The game fails if the guess was wrong or if the speaker or the hearer 
failed at some earlier point in the game. In case of a failure, the speaker gives 
an extra- verbal hint by pointing to the topic he had in mind, and both agents 
try to repair their internal structures to be more successful in future games. 

Agents start with no prior designer-supplied ontology nor lexicon. A shared 
ontology and lexicon must emerge from scratch in a self-organised process. The 
agents therefore not only play the game but also expand or adapt their ontology 
or lexicon to be more successful in future games. 



The Conceptualisation Module Meanings are categories that distinguish 
the topic from the other objects in the context. The categories are organised 
in discrimination trees where each node contains a discriminator able to filter 
the set of objects into a subset that satisfies a category and another one that 
satisfies its opposition. For example, there might be a discriminator based on the 
horizontal position (HPOS) of the center of an object (scaled between 0.0 and 
1.0) sorting the objects in the context in a bin for the category ‘left’ when HPOS 
< 0.5, (further labeled as [HPOS-0. 0,0.5]) and one for ‘right’ when HPOS > 0.5 
(labeled as [HPOS-0. 5,1.0]). Further subcategories are created by restricting the 
region of each category. For example, the category ‘very left’ (or [HPOS-0.0,0.25]) 
applies when an object’s HPOS value is in the region [0.0,0.25]. 

A distinctive category set is found by filtering the objects in the context from 
the top in each discrimination tree until there is a bin which only contains the 
topic. This means that only the topic falls within the category associated with 
that bin, and so this category uniquely filters out the topic from all the other 
objects in the scene. Often more than one solution is possible, but all solutions 
are passed on to the lexicon module. 
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The discrimination trees of each agent are formed using a growth and pruning 
dynamics coupled to the environment, which creates an ecology of distinctions. 
Discrimination trees grow randomly by the addition of new categorisers splitting 
the region of existing categories. Categorisers compete in each guessing game. 
The use and success of a categoriser is monitored and categorisers that are irrel- 
evant for the environments encountered by the agent are pruned. More details 
about the discrimination game can be found in [10]. 

Verbalisation module The lexicon of each agent consists of a two-way asso- 
ciation between forms (which are individual words) and meanings (which are 
single categories) . Each association has a score. Words are random combinations 
of syllables. When a speaker needs to verbalise a category, he looks up all pos- 
sible words associated with that category, orders them and picks the one with 
the best score for transmission to the hearer. When a hearer needs to interpret 
a word, he looks up all possible meanings, tests which meanings are applicable 
in the present context, i.e. which ones yield a possible single referent, and uses 
the remaining meaning with the highest score as the winner. The topic guessed 
by the hearer is the referent of this meaning. 

Based on feedback on the outcome of the guessing game, the speaker and 
the hearer update the scores. When the game has succeeded, they increase the 
score of the winning association and decrease the competitors, thus implement- 
ing lateral inhibition. When the game has failed, they each decrease the score of 
the association they used. Occasionally new associations are stored. A speaker 
creates a new word when he does not have a word yet for a meaning he wants 
to express. A hearer may encounter a new word he has never heard before and 
then store a new association between this word and the best guess of the possible 
meaning. This guess is based on first guessing the topic using the extra-verbal 
hint provided by the speaker, and on performing categorisation using his own 
discrimination trees as developed thus far. These lexicon bootstrapping mecha- 
nisms have been explained and validated extensively in earlier papers [11]. 

The conceptualisation module proposes several solutions to the verbalisation 
module which prefers those that have already been lexicalised. Agents monitor 
success of categories in the total game and use this to target growth and pruning. 
The language therefore strongly influences the ontologies agents retain. The two 
modules are structurally coupled and thus get coordinated without a central 
coordinator. 

Semiotic Dynamics We propose the notion of a semiotic landscape (which 
we also call RMF-landscape) to analyse grounded semiotic dynamics. The semi- 
otic landscape is a graph, in which the nodes in the landscape are formed by 
referents (objects), meanings (categories) and forms (words), and there are links 
if the items associated with two nodes indeed co-occur (figure 2) . The relations 
are labeled RM for referent to meaning, MR for meaning to referent, RF for ref- 
erent to form, FR for form to referent, and FM for form to meaning and MF for 
meaning to form. The RMF-landscape in figure 2 (taken from the experiment to 
be discussed later) contains an example where the same object 03 is designated 
by two meanings [G-0.25,0.5] and [G-0. 375, 0.5]. The first meaning, which is more 
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general, is expressed by two words ”xu” and ”fepi” and the second meaning by 
the word ”pasi”. Usually we see much more complex situations and complexity 
further increases when the same meaning is also used to denote other referents 
(which is obviously very common and indeed desirable). We track the changes 



[G - 0,25, 0.5J 
[G- 0.375, 0.5] 




XU 

FEPI 

PASI 



Fig. 2. A semiotic landscape represents the co-occurrences between referents, meanings 
and forms. 



in the semiotic landscape by recording the actual verbal behavior of the agents 
while they engage in language interactions, more specifically, by collecting data 
on the co-occurrence of items such as the forms used with a certain referent 
or the meanings used with a certain form. Frequency of co-occurrence is repre- 
sented in competition diagrams, such as the RF-diagram in figure 6, which plots 
the evolution of the frequency of the observed referent-form co-occurrences for 
a given referent in a series of games. Similar diagrams can be made for the FR, 
FM, MF, RM and MR relations. 

3 Case Study 

For real world environments, the set of possible referents is infinite, so the semi- 
otic landscape is infinite. For purposes of analysis, we therefore need to restrict 
the possible environments and thus the possible referents artificially and then 
study the semiotic dynamics very precisely. For the present paper, we analyse a 
test run involving 20 agents and 8 objects, which means 4 x C| = 280 possible 
situations. The run starts with 4 objects (and hence 4 possible situations) and 
after every 5000 games a new object was introduced. During the final 15000 
games, no new objects were introduced. The overall evolution of the dynamics 
is shown in figure 3. We see that success in communication or discrimination 
mounts quickly, drops after a new object is introduced, but increases again as 
the agents develop new concepts and words. We also see that it takes less and 
less time to absorb new objects, indicating that the language is not situation- 
specific. And that the system evolves towards maximum communicative and 
discriminatory success. 

The main goal of this paper is to show how an analysis in terms of the 
semiotic dynamics aids to understand the evolution of the system. We examine 
one word, ”fepi” which is general in the sense that it is clearly used for more 
than one object (03 and 05) and in many different contexts (figure 4 left) 
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Fig. 3. The graph shows the communicative and discriminatory success for a series of 
35000 language games. 



Closer examination of the meanings (figure 4 right) reveals that ”fepi” ‘means’ 
a particular shade of green, categorising a green-intensity [0.25,0.5] on the G 
channel of the RGB data. The category is labeled [G-0.25,0.5]. When examining 
the forms used to express this category (figure 5) we see that ”fepi” has indeed 
emerged as dominant for this meaning, but that earlier on the word ”xu” was 
dominant, which raises the question how ”fepi” managed to overtake ”xu” in the 
long run. 

When inspecting in more detail the game traces, we see that ”fepi” is created 
in game 328 by agent-3, playing the role of speaker, in order to refer to object 03 
using the meaning [G-0.25,0.5]. Agent-19 acquires at this point the same mean- 
ing for [G-0.25,0.5] as hearer. In one sense, we could say that agent-19 learns 
this meaning of ”fepi” from agent-3 but that is not entirely accurate. Agent-19 
constructed a possible meaning for ”fepi” and this happened to be the same as 
the one used by agent-3, but this is accidental. This is a first important obser- 
vation. Agents only indirectly learn the language from others. They construct 
a language which is compatible with the language used by others in the situa- 
tion encountered - and in turn influence by their own use the language of other 
speakers. Compatibility between individual language systems occurs through the 
positive feedback between language use and communicative success. 

As figure 6 (left) shows, ”fepi” is not immediately successful. Instead ”xu” 
wins the initial competition against ”fepi” and several other words for designat- 
ing 03. Typically, multiple ways to designate the same object develop but this 
synonymy is damped due to the lateral inhibition between the different forms 
competing for the same meaning. 

”xu” has also the meaning [G-0.25,0.5] although there are some other mean- 
ings associated with ”xu” as well, which are also distinctive for 03. Because 




Fig. 4. Left: FR diagram showing the objects referred to by the word ”fepi”. After 
10000 games ”fepi” is consistently used for 03 and 05. Right: FM diagram showing 
the meanings of ”fepi”. 




Fig. 5. MF diagram showing the different words circulating in the population for ex- 
pressing the concept [G-0. 25, 0.5]. 
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Fig. 6. Left: RF diagram showing the different words being used for identifying 03. 
Right: RF diagram showing the different words used for 05. 



agents only get feedback about reference and not about meaning, words are 
polysemous until disambiguated by situations in which its different shades of 
meaning are incompatible. 

In game 5000, the arrival of a new object 05 destabilises the association 
between ”xu” and [G-0.25,0.5]. Closer examination reveals that 05’s green value 
is a bit lighter (in the range [0.25,0375]) than that of 03 (which is in the range 
[0.375,0.5]), so that a more refined distinction is necessary if both objects occur 
in the same context. As seen from the RF-diagram in figure 6 left, ”xu” is no 
longer used for 03. Instead the word ”pasi” comes to dominate. ”pasi” has 
indeed the more specific meaning [G-0.375,0.5]. At the same time we see from 
the RF-diagram for 05 (figure 6 right) that the word ’’rimebi” dominates for 
designating 05. As expected ’’rimebi” has the second more specific meaning [G- 
0.25,0.375]. The more general word ”xu” is still useful in contexts where the 
refined distinction is not necessary and so we would expect that ”xu” continues 
to exist. However this is not the case. ”xu” looses out completely and its role is 
taken over by ”fepi” . Why is this so? ”xu” looses its strength because (1) the main 
meaning of ”xu” is often not distinctive enough and therefore a game fails, and 
(2) other meanings competing for ”xu” gain (as seen from figure 7) pulling down 
the original green-based meaning due to lateral inhibition. The weakening of ”xu” 
opens the way to ”fepi” which still carries the more general meaning of green 
and does not have competitors. We see from the RM diagram (figure 7 right) 
how first a general meaning coupled to ”xu” is used for 03, then a more specific 
meaning coupled to ”pasi” (after game 5000) and then again a more general 
meaning coupled to ”fepi”. The more general meaning increases because due to 
the arrival of several new objects, the more abstract green category becomes more 
useful again, even though the more specific meaning is still occasionally needed. 
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Fig. 7. Left: FM diagram showing the different meanings of XU. After game 5000, the 
meaning of ”xu” becomes unclear 2 ind the word falls in disrespute. Right: RM diagram 
showing the meanings used to identify 03. 



A similar picture is seen for the meanings used for 05 where ’’rimebi” and ”fepi” 
are used depending on the degree of distinction required by the context. 

4 Conclusion 

The main goal of the paper was to analyse a case study of semiotic dynamics 
based on data recorded from a group of 20 distributed autonomous robotic agents 
playing language games about real world scenes through visual observations. We 
see that the word-meaning pairs active in a population show a complex picture. 
The agents have never exactly the same lexicon but each has their own ideolect. 
Moreover periods of heavy competition alternate with periods of relative stabil- 
ity. Stability occurs when one word temporarily manages to become dominant in 
a winner-take-all process. Semiotic dynamics therefore shows similarities to the 
dynamics exhibited by other types of complex adaptive systems such as punctu- 
ated equilibria in species evolution or evolution in adaptive cooperative games 
[6]. It illustrates the hypothesis that language is an evolving complex dynamical 
system which self-organises and gets transmitted in a cultural process. 
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Abstract. We study the evolution of communication where concepts 
are developed individually by agents and relations between concepts and 
forms (words, signals) are learned through interaction with other agents. 
By constructing concepts based on experience with the same environ- 
ment, agents develop similar conceptual systems. Concepts represent sit- 
uations in the environment. The system of associations between forms 
and meanings is viewed as a dynamical system. This paper presents 
first results with investigating the phase space of the system. The anal- 
ysis contributed to understanding the interaction between association 
strengths of different agents and of different meanings. 



Introduction 

In studying the evolution of communication, an important question is that of 
how stable systems connecting concepts and forms (words, signals) may come 
about. In this paper, we will be investigating the case where these connections are 
learned by a group of agents in a shared environment. The concepts themselves 
are formed by the agents based on experience within the environment using 
autonomous concept formation [de Jong, 1999]. 

Although the fact that simple local adaptive behavior of agents causes a sys- 
tem of communication to emerge is interesting in itself, a deeper understanding 
of this process is valuable. Since the complete system of agents, concepts and 
forms is a complex adaptive system (see e.g. [Steels, 1996a], [Steels, 1997]) con- 
taining a large and varying number of changing elements, it is difficult to predict 
the behavior of the system given the rules which govern it. A viewpoint that has 
been found to be more appropriate for these types of systems is the dynamic 
systems approach. 

In the literature, several examples of dynamic systems approaches to language 
exist. A number of researchers have investigated motor behavior during speech, 

* An extended version is available as AI-MEMO from the VUB AI Lab 
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e.g. [Saltzman, 1995], [Browman and Goldstein, 1995], [Tuller and Kelso, 1990]. 
In [Elman, 1995], an analysis of the activation space of recurrent neural networks 
is presented as a dynamical view on grammar and embedding. A difference be- 
tween that work and ours, is that grammar does not play a role here. Instead, 
the focus here is on the use of language, and concepts are grounded by use in 
the environment, see [Steels, 1996b]. 

In section 1 the alarm calls experiment and the association adaptation mech- 
anism are briefly explained. In section 2 the resulting system of communication 
is described and the phase space of the system is examined. The final section 
presents conclusions. 

1 Environment and Behavior of Individual Agents 

In this section we describe the environment used in the experiments and the 
mechanism that, when used by individual agents in that environment to adapt 
form-meaning associations and to determine their situation, leads to a stable 
system of communication. 

The Environment The task faced by the agents is to occupy a safe position 
within a 10 by 3 grid whilst being hunted by predators. At each timestep, each 
agent in turn selects an action and receives a success value. The action consists 
of choosing the vertical position (bottom, middle or top row) and a horizontal 
displacement (1 step to either side, or staying). Three predators exist. When 
one is present, only a single row is safe (success 1); agents in the other two 
rows receive success 0. Sensor information consists of an agent’s horizontal and 
vertical position, and a number indicating the predator (none, or predator 1, 2 
or 3) . The predator indicator is not completely reliable though; this is where the 
benefit of communication comes in. In 10% of the cases when a new predator 
arrives, an agent is not able to see it, depending on its horizontal position, and 
this remains so until it leaves (after 25 timesteps). 

Formation of Situation Concepts Based on the relationship between 
sensor information and success, agents learn to distinguish different situations. 
Knowing in which situation it is allows an agent to choose optimal actions. 
Situation concepts are formed using autonomous concept formation, described 
in [de Jong, 1999]. Briefly, the method as it is used here identifies regions in the 
sensor-action space where success is constant. Since concepts are formed based 
on experience, the number of concepts an agent will develop is not known in 
advance, and may differ among agents. 

Adaptation of Form- Meaning Association Strength Once agents have 
concepts at their disposal that capture relevant situations in the environment, 
a basis for developing communication is present. At each timestep, every agent 
produces a signal corresponding to its perceived situation. When some agent does 
not receive sensory information that would allow it to determine its situation, this 
information may be obtained through communication. If the association between 
a signal received and a situation concept other than the one indicated by the 
sensors is strong enough, the agent can deduce its actual situation from that 
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signal and act accordingly. This has been shown in [de Jong, 1999]. Associations 
between meanings (situation concepts) and forms (words, signals) are adapted 
locally by each agent based on (1) frequency of occurrence and (2) success in 
determining the situation. The system is open with respect to the signals the 
agents use, and the number of signals is not fixed. 

2 Results 




Fig. 1. Left: The evolution over time of the association strength of the word fi with 
each agent’s meaning for the first predator (PI). Right: Phase plot for agent 1. Each 
point represents a pair of association strengths, one between fi and the meaning for 
PI (always horizontal), 2 und one between fi and another meaning (vertical) 



This section presents the results of the experiment. In a particular experi- 
ment, see figure 1 (left), 4 agents developed strong associations between the word 
fi and the situation where predator 1 is present. For agent 1, this association 
fluctuated (t = 5000) before it was established (t = 8000). In the phaseplot in 
flgure 1 (right), this final situation is represented by the points at the vertical 
line to the right (x = 1). The main part of the plot consists of trajectories where 
the association between fi and PI (PIFI) competed with the other meanings. 
Of particular interest is the folded curve just right of the middle at the top of 
the graph. Here, the strength of PIFI repeatedly reaches high values (the hori- 
zontal position, around 0.75), but returns to lower values each time. Apparently, 
the competing meaning (corresponding to the vertical position) was too high to 
be overcome. As we know from the time series, PIFI did move to 1 later on. 
This is represented by the series of squares starting just below the line that was 
mentioned, and describing a curve towards the lower right corner, indicating the 
meaning gave way to PIFI to become associated. The fact that the system’s state 
never reaches the upper right corner corresponds to the competitive nature of 
form meaning associations; in the top right corner, a signal would be simultane- 
ously associated with two meanings. It can be deduced from the algorithm that 
such a state is unlikely to be stable. What analyzing the algorithm doesn’t show 
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is how the system evolves over time, i.e. through what states it tends to move. 
That is the result of complex interaction between the association strengths of 
multiple agents and multiple meanings and words for each of those agents. This 
is why it can be useful to investigate the phase space of a system, in addition to 
time series of the relevant data. 




Fig. 2. Left: Phase plot of the association strength of the word ci with the first predator 
for different pairs of agents. Right: Phase plot of the association strength of the word 
bo with the second predator for different pairs of agents 



A rather different pattern is seen in figure 2 (left), which shows a phase- 
plot where the associations between the word ci and PI are plotted for pairs of 
agents. This word eventually becomes associated to PI for every agent. What 
the phaseplot shows us is how the association strengths influence each other. 
Whereas near the middle of the graph, where association strengths are mod- 
erate, variation of the strengths appears rather flexible, this changes when the 
system reaches the top right corner. In that phase, the association strengths have 
been observed (also in other graphs) to tend towards each other, resulting in an 
increased density of points around the diagonal line y = x. This is what one 
would intuitively expect; once most agents use some word for a certain concept, 
the rest will follow and adopt the word as their preferred word for that concept. 

Figure 2 (right) shows the association strength patterns for a word (BO) and 
a meaning (P2) that did not become associated. Similar to the previous graph, 
the association strengths of different agents for a particular combination of form 
and meaning tend towards each other when extreme values are reached, though 
this time the values are low instead of high. Qualitatively, this can be understood 
as a word losing its association with a particular concept. When the word is not 
used anymore by some critical mass of the population, the rest follows and also 
discards the word. 
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3 Conclusions 

The main conclusion that can be drawn from the experiments is that the evo- 
lution of communication, as it was observed in the experiments presented here, 
can appropriately be viewed as the behavior of a dynamical system. Interact- 
ing elements are the vital part of such a system, and the composition of their 
behavior soon becomes too complex to analyze. By taking a dynamical sys- 
tems perspective, different ways of analyzing the system become available. The 
analysis of experiments based on phase plots revealed relationships that would 
probably not have been found otherwise. Particularly, a tendency of different 
agents to converge towards the same association strength, both high (1) and low 
(0) for certain combinations of meanings and signals was found. Furthermore, 
the behavior of the competition between one form and several meanings of a sin- 
gle agent could be observed. It may be concluded that investigating the phase 
space of a learned system of communication gave new insights, and is a promis- 
ing method. An interesting continuation of the analysis would be to investigate 
relationships between order and control parameters of the system [Kelso, 1995]. 
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Abstract. A new approach to the origins of syntax in human language 
is presented. Using computational models of populations of learners, it is 
shown that compositional, recursive mappings are inevitable end-states 
of a cultural process of linguistic transmission. This is true even if the 
starting state is no language at all. It is argued that the way that knowl- 
edge of language is transmitted through a learning bottleneck profoundly 
influences its emergent structure. This approach provides a radical alter- 
native to one in which the structure of language is viewed as an innate, 
biological adaptation to communicative pressures.^ 



1 The origins of syntax 

Human language is unique among natural communication systems in having a 
compositional and recursive mapping between meanings (mental representations 
to be communicated) and forms (linear strings of, typically phonetic, gestures). 
It is also extremely unusual in the way it is learnt. Each generation acquires at 
least some of the meaning-form mapping by observing the use of the previous 
generation’s mapping. It has been argued that this type of learning of mappings 
is also unique to humans (Oliphant, 1998). In this paper I will explore, using a 
working model of linguistic transmission, the links between these two features 
of language. In particular, I aim to explain the origins of syntax in language by 
looking at general properties of the transmission of learned behaviour. 

The explanation put forward here contrasts radically with what is perhaps 
the dominant approach to the origins of syntax. In many linguists’ view, syntax 
is more or less fully specified by a learning mechanism (the Chomskyan Language 
Acquisition Device, or LAD) which significantly constrains the language learner 
with prior knowledge about the nature of language (Chomsky, 1986). Prior bias- 
es or constraints on a learner are assumed to be innate, and therefore genetically 
determined. This leads naturally for many researchers (see Pinker and Bloom, 

^ This work benefited greatly from conversation with and comments from Jim Hurford, 
Mike Oliphant, and Ted Briscoe and was supported by ESRC grant R000237551. 
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1990 for example) to the conclusion that the best explanation for the structure 
of language is one which invokes biological evolution. In other words, the Lan- 
guage Acquisition Device is a biological adaptation that evolved through natural 
selection in response to the need to communicate “propositional structures over 
a serial interface” (Pinker and Bloom, 1990, 707). 



2 Evolution without natural selection 

The standard (biological) evolutionary approach to the origins of syntactic struc- 
ture typically ignores the dynamics of the social/cultural transmission of lan- 
guage.^ However, recently there has been more interest in the formal and com- 
putational properties of the influence of learning on the dynamic process of 
language transmission, historically, from one generation to the next (Niyogi and 
Berwick, 1997; Steels, 1997; Briscoe, 1998; Batali, 1998; Hurford, 1998). This 
paper explores the hypothesis that this process alone is enough to explain the e- 
mergence of the central unique features of syntax even where the prior bias of the 
learners is not subject to evolutionary change, and is relatively unconstraining. 

Linguistic transmission can be viewed as a repeated transformation of in- 
formation between two domains: the internal or I-domain, and the external 
or E-domain^ (Kirby, 1999a). The I-domain contains languages that exist as 
grammatical knowledge in individuals’ brains, whereas the E-domain contains 
languages that exist as sets of actual utterances. This conception has interest- 
ing parallels with the conception of genetic transmission in biology. Figure 1, 
compares the transformation of information between phenotypic and genotypic 
domains with the transformation between El- and I-domains. 




Linguistic transmission 



Biological transmission 



Fig. 1. Transformation of information in linguistic and biological adaptive systems. 



In both these adaptive systems, the biological and the linguistic, the transfor- 
mations that map between the two domains can be seen as “bottlenecks” on the 
transmission of information over time through the whole system. For example, 

^ See (Briscoe, 1998; Kirby and Hurford, 1997) for counter examples of papers which 
look at the combined effects of cultural transmission and biological evolution. 

® These domains are parallel to Chomsky’s (1986) I-language and E-language. 












696 



a particular piece of genetic information may not persist because the phenotype 
that it expresses may not survive long enough to reproduce. Similarly, a gram- 
matical regularity may not persist because the utterances that it gives rise to 
cannot successfully reconstruct it through learning. Just as the bottleneck on 
transmission of genetic information has implications for the structure of the or- 
ganisms that emerge (this is the basis of the neo-Darwinian synthesis, after all), 
we should expect the equivalent bottleneck on the transmission of grammars to 
impact on the eventual structure of languages. 



3 Modelling linguistic transmission 

In order to test how the learning bottleneck influences the structure of language, 
a working model of linguistic transmission must be created. The model consists 
of a simple population of agents who have a pre-specified “world” of concepts 
about which they can communicate, and a pre-specified set of symbols with 
which they can create utterances. The possible communicative behaviour of an 
agent is determined by that agent’s internal grammar which is learnt solely 
through observation of the behaviour of other agents in the population. The 
population model is “generational” in that members of the population die and 
are replaced, but the initial state of the agents is always the same, the agents 
are not prejudiced or rewarded according to their behaviour in any way, and 
the only information that flows through the system is in the form of utterances 
— in other words, there is no natural selection or biological evolution in the 
simulation. 



Semantics The semantic space in the model — the space from which meanings 
of utterances for the agents is chosen — is made up of combinations of simple 
atomic concepts. These concepts include, for example: 

John heather 
loves admires 
believes 

These atomic elements can be combined into simple predicate-argument propo- 
sitions, which may have hierarchical structure. For example: 

admires(heather,john) 
believes(heather,admires(john, heat her)) 

The full set of propositions is infinite in principle, because using predicates such 
as believes propositions can be limitlessly nested. In the simulations presented 
here, there are five “object” concepts (such as john, heather etc.), five “action” 
concepts (such as loves, admires etc.), and five “embedding” concepts (such as 
believes, knows etc.). 
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Utterances The utterances of the agents are linear concatenations of the 26 
lower-case letters of the alphabet. There is no principled limit to the length of 
utterance, and the shortest utterance is 1 symbol long. Pairs of strings of symbols 
and propositions make up the E-domain objects in the simulation. So, if the 
agents happened to speak English, an E-domain object might look something 
like: 



< heatherloves John, loves(heather .John) > 



Grammars The I-domain objects in the model are a simple form of definite- 
clause grammars (DCGs). Critically, this choice of formalism does not build-in 
compositionality or recursion. Consider an agent that could produce the string 
heatherloves John meaning loves(heatherjohn). Here are two (of many, many 
possible) grammars that this agent may have internalised: 



5/loves(heather,john) — > heatherlovesjohn 



S!p{x, y) ->• N/x V/p N/y 
V/loves — )• loves 
TV/heather — ^ heather 
N/john -> John 



The S symbol in these grammars is the start symbol for the grammar, and N and 
V are arbitrary node labels. The material after the slash on the non terminals 
is the semantic representation of that node. For left-hand side non-terminals in 
this formalism, this semantic representation may be a fully specified proposition, 
an atomic concept, or a partially specified proposition (i.e. one with variables 
replacing concepts). The right-hand side non-terminals’ semantics may only be 
variables. 

The grammar on the right is compositional. The meaning of a sentence is 
made up by combining two strings of type N with a string of type V, and the 
semantics for the sentence is built up from the semantics of these individual 
sub-strings using the variables x, p and y. The grammar on the left, however, is 
completely non-compositional. In no way is the meaning of the string a function 
of its parts. 



3.1 Learning 

Clearly, the central part of the simulation is the agents’ ability to take E-domain 
objects (string/meaning pairs) and build I-domain objects (grammars). In fact, 
the agents are essentially induction machines designed to store pairs of strings 
and meanings, and find possible generalisations over these pairs (if such gen- 
eralisations exist). The algorithm presented here has been designed specifically 
with simulation tasks in mind — it is extremely simple and efficient. Although 
no claims are made for its efficacy as a practical learning algorithm for DCGs 
in general, it does model in a simple way the dual processes of rote learning 
and search for generalisation that should be the core of any theory of language 
acquisition. 
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The algorithm is described in some detail in Kirby (1999b), and outlined 
informally^ in figure 2. It works in two stages for each string/meaning pair. 
Firstly, the individual pair is incorporated into the learner’s grammar by addition 
of the simplest possible rule that will generate that pair. For example, give the 
string/meaning pair: 

< heatherloves John, loves{heather,john) > 

the following rule would be added to the grammar: 

5/loves(heather,john) — > heatherloves John 

The second stage of learning (which takes place after incorporation of each 
rule) is a process of searching for possible generalisations (essentially subsump- 
tions) over pairs of rules. For example, given the pair of rules: 

5/loves(heather,john) — > heatherloves John 
5/loves(heather,gavin) — heatherlovesgavin 

the algorithm will replace these with a more general rule (via the “chunking” 
step g.3): 

5/loves(heather,a;) — > heatherloves N/x 

The N category is a newly invented category, with its own rules added to the 
grammar: 

N/john John 
iV/gavin — > gavin 



3.2 Invention 

Sometimes, the agents may wish to express meanings for which their grammar 
does not provide a way of saying. In fact, in the initial stages of the simulation, 
this will be very frequent, since the population will start with no language at all. 
The model therefore requires some way for creative language invention to occur. 
The simplest way to model this is for the agents to produce a string of random 
symbols whenever they encounter a meaning they do not know how to produce. 
In the simulations presented here, the strings vary between 1 and 3 symbols in 
length.® 

'* The chunking algorithm is not given here for lack of space (see Kirby, 1999b for 
details). The example in the text shows a typical application of chunking. 

® The complete invention algorithm is discussed elsewhere (Kirby, 1999b). The on- 
ly complication for implementation purposes is what should be done if the agent 
knows how to say almost all of a particular meaning. The algorithm in effect carries 
out a top-down production for the meaning as best it can given the grammar, and 
generates a random string at the point in the tree where the generation fails. This 
method ensures that invention does not ever introduce new structure. In other words, 
the invented string will never be more compositional than grammatical strings that 
already exist' in the agent’s language. 
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Induction algorithm: Given a meaning m, a string s and a grammar 

1.1 parse s using p, if the parse is successful, then return g. 

1.2 form g', the union of g and S/m — t s. 

1.3 apply a generalisation algorithm to g': 

g.l take a pair of rules < ri,r 2 > from g' . 

g.2 if there is a category label substitution c to c', that would make ci identical 
to T 2 , then rewrite all c in g' with c , go to g.5, 
g.3 if n and T 2 could be made identical by “chunking” a substring on either or 
both their right hand sides into a new rule or rules, then create the new rules, 
and delete the old ones in g\ go to g.5. 

g.4 if ri ’s right hand side is a proper substring of C 2 ’s and ri ’s semantics is identical 
to either the top level predicate or one of the arguments of r 2 ’s semantics, then 
rewrite T 2 in g' to refer to ri, go to g.5. 
g.5 delete all duplicate rules in g' . 
g.6 if any rules in g' have changed, go to g.l 

1.4 return g' . 



Fig. 2. Outline of the induction algorithm. 



3.3 Simulation cycle 

For the simulations reported here, the simplest possible population model is 
used. At any one time there are only two individuals: a learner and a speaker. 
The speaker is required to produce an utterance for a meaning chosen at random, 
either by using its grammar or by employing invention if the grammar cannot 
generate a string (if the speaker invents, then it uses its utterance as input to its 
own learning to ensure consistency). The learner is given the randomly chosen 
meaning and the speaker’s string, and uses the pair as input to the induction 
algorithm. This process is repeated a fixed number of times, and then the speaker 
is removed, the learner becomes a speaker, and a new learner with an empty 
grammar is introduced. 



4 Experiments 

The graphs in figure 3 show the average behaviour of ten simulation runs starting 
with different random seeds. In all the runs, each speaker first tries to produce 
50 randomly chosen degree-0 meanings (i.e. those with no embedding), then 
50 randomly chosen degree- 1 meanings (with one embedding) and finally 50 
randomly chosen degree-2 meanings (with two embeddings). The graphs show 
the proportion of the meanings of each type that the speakers were able to 
produce without resorting to random invention. The other line on the graphs 
show a measure of size of the grammars (number of rules divided by 300 to fit 
it on the same plot). 
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Fig. 3. Average of ten simulation runs, plotted on two different time-scales. Coverage 
of the different meaning types increases and the size of the grammar decreases over 
time. 



To demonstrate more clearly what the behaviour of the evolving system is, 
we can examine the grammars of one population over time in some detail: 

The emergence of vocabulary In the very early stages of the simulation, the 
speakers are not able to express many of the meanings without resorting to 
random invention — this is due to the fact that the population starts with no 
language at all. Immediately, there is a rapid rise in the size of grammars, as 
new innovated utterances are acquired by learners. The listing below shows part 
of an early grammar, which is best thought of as simply a long list of unrelated, 
idiosyncratic vocabulary items. 



5/loves(pete,mary) axk 
S/thlnks(john,likes(gavin, heather)) — ► ew 
S/decldes(heather,says(mary,detests(john,pete))) -> 
pq 

5/admires(pete, heather) — ► njl 
5/decides(john,believes(pete,hates(pete,mary))) — > 
vz 

5/admires(gavin,mary) s 

5/knows(mary,decides(mary,loves(gavin,pete))) —y c 
5/detests(pete,john) —y my 

5/believes(pete,believes(mary,likes(mary, heather))) —y 
z 

5/likes(gavin,mary) —y but 
5/loves(gavin.pete) — >• poe 



5/thinks(gavin,loves(heather,gavin)) —y oeb 
5/decides(mary,hates(gavin,john)) —y vri 
5/dctests(heather,gavin) —y wb 
5/thinks(mary,hates(john.pete)) — ► y 
S/thinks(john,hates(mary,pete)) —y pff 
5/decides(pete,detests(heather,mary)) —y fi 
5/believe5(gavin.decides(gavin,admires(mary,john))) — 
go 

5/believes(pete,knows(gavin.loves(heather,gavin))) —y 



The emergence of compositionality Figure 3 shows that after a brief initial rise 
the size of grammars soon starts to fall, and the proportion of degree-0 meanings 
that are expressed without invention increases. The grammar fragment below 
shows how this happens. Here, by the 100th generation, a compositional encod- 
ing has emerged, although it is only reliably used for degree-0 meanings. This 
particular grammar has two categories for arguments (arbitrarily named A and 
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C), which can be thought of as case-marked nominals, and one verbal category 
for predicates (named D). (N.B. there are many other idiosyncratic rules for 
more complex meanings in this grammar that are not shown.) 



S/p{x, y) gj C/y z A/x D/p 

A/gavin — > dl 

z4/heather tej 

z4/john —¥ n 

/1/mary —y qp 

.4 /pete —y h 

C/gavin —y x 

C/heather -4 ovp 

C/john -y i 

C/mary —y h 

C/pete — >• y 

D/admires — > b 

D/detests — > gl 



D/detests —y xe 
D/hates — > c 
D/likes —y e 
D/loves —y m 

S/knows(john,decides{pete,loves(mary,john))) — v 
qixoyia 

5/decides(x.thinks(heather.likes{pete,mary))) —y 
C/x jwyjtejdbznuy 



Here is an example sentence from this language with an English gloss (NOM 
and ACC stand for nominative and accusative): 

(1) gj ovp z qp b 

Heather-ACC Mary-NOM admires 
“Mary admires Heather” 

The emergence of recursion Finally, the percentage of degree- 1 and degree-2 
meanings that are expressible without invention increases as the size of the 
grammar continues to decrease. By generation 1000, an extremely concise lan- 
guage has emerged that can completely express the infinite meaning space (the 
entire grammar is shown below). This is done by using two verbal categories 
{B and D), one nominal category (A), and two sentence rules, one of which is 
recursive. In other words, a highly regular, expressive, syntactically structured 
language has emerged. This language persists without significant change for the 
rest of the simulation (which was terminated after 10000 generations). 



S/p{x,y) -4 gj A/y i A/x B/p 

S/p{x,q) — > i A/x D/p S/q 

A/gavin dl 

>l/heather —y tej 

>l/john n 

A/mary qp 

A/pete —y h 

J5/admires —y b 

B /detests —y wp 



B/hates —y c 
B/likes —y e 
B/loves —y m 
£>/believes — ► g 
B/decides —y u 
D/knows —y ipr 
B/says -y p 
B/thinks —y m 



Here are examples from this language, again with glosses in English (notice 
how little the degree-0 structure has changed, apart from a simplification of the 
nominal system so that the nominative nouns are now used for accusatives as 
well) : 

(2) gj tej f qp b 

Heather Mary admires 
“Mary admires Heather” 
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(3) i h ipr gj tej f qp b 

Pete knows Heather Mary admires 
“Pete knows that Mary admires Heather” 

(4) i dl p i h ipr gj tej f qp b 

Gavin says Pete knows Heather Mary admires 
“Gavin says that Pete knows that Mary admires Heather” 

5 Conclusion: evolution through learning 

The computational model shows how significant linguistic evolution can occur 
without any biological evolution. The learners in the simulation are not innately 
constrained to acquire only syntactic languages, nor are they rewarded in any 
way for their ability to communicate. The simulation is not seeded with any 
particular language — in fact, the first speaker has no linguistic knowledge to 
draw on whatsoever — and the only way information is transmitted over time 
is by observational learning. And yet, from this apparently simple set up, a 
linguistic system emerges. First, the language of the community is an impover- 
ished vocabulary-like system, with idiosyncratic signals being assigned to whole 
complex meanings. Later, a compositional mapping emerges, with a lexicon for 
atomic concepts and a simple way of putting these together to form sentences. 
Finally, recursion evolves so that the agents are able, with only a small vocabu- 
lary and a couple of rules of combination, to express an infinite range of possible 
meanings. 

The obvious questions that we are left with are “why does the language evolve 
in this way?” and “how general is this result?” Ultimately, a satisfactory answer 
to both these questions may well lie in a mathematical analysis of the properties 
of the transmission of information via observational learning. For the moment, 
however, it seems that the learning bottleneck that exists between the E-domain 
and I-domain in figure 1 must be what is driving the system to evolve. 

Firstly, consider a non-compositional system such as the early ones in the 
simulation run described above. Here, the mapping between meanings and strings 
is such that each rule in the grammar can at best only account for one meaning- 
string pair. This necessarily means that to learn any particular rule in such a 
grammar, a learner must be exposed to exactly that pairing. The chance that a 
particular non-compositional I-language will survive over time are slim in this 
situation, because every string-meaning pairing in that I-language will have to 
appear in the input data to the learner. Given that the number of utterances that 
the learner hears is smaller than the number of meanings that a speaker might be 
called upon to produce, then a randomly chosen, complete, non-compositional 
language cannot be transmitted through the learning bottleneck. Notice, this 
must be true for any learning algorithm. 

Now, contrast this situation with one in which there is a compositionality 
in the language. Given an E-language in which there is some pattern to the 
string-meaning mapping, it is no longer true that a particular pairing can only 
persist if it is directly observed. A learner can generalise from an observed set of 
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string-meaning pairs to ones that are unseen. This means that a compositional 
language will be far more transmittable through the bottleneck. A smaller set of 
example pairings will required in order to reconstruct the language accurately. 
Again, as long as the learning algorithm is able to exploit pattern, or is biased 
towards generalisation, then this is true in general. 

It seems as if the process of the transmission of learned behaviour has certain 
general properties whatever our assumptions about the learners. These proper- 
ties will tend to favour the emergence of systems with patterns that can be 
exploited by learning. Where the behaviour that is being learned is a mapping 
between two structured spaces, we can expect the process of transmission to give 
us mappings that preserve structure. For the case of language — where these 
spaces are propositional meanings on the one hand, and linear strings on the 
other — the mappings that will inevitably emerge will be syntactic. 
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Abstract. Some recent Artificial Life models have attempted to explain the 
origin of linguistic diversity with varying conclusions and explanations. We 
posit, contrary to some existing Artificial Life work, that linguistic diversity 
should naturally emerge in spatially organised populations of language learners, 
and this is supported by our experimental work and by recent literature. 



1 Introduction 

There exists a rich literature on linguistic diversity, much focussed on tracing the 
roots and relationships between languages and regional variations in dialect. This 
division between language and dialect is not clear-cut. Two different languages may 
be mutually intelligible while two dialects of the one language may possess a low 
degree of mutual intelligibility [1]. Dialect continua make the distinction between 
languages and dialects harder still. These chains of dialects can cover large areas and 
cross many national boundaries, each dialect intelligible to speakers of neighbouring 
dialects. Yet dialects at the ends of the chains may be mutually incomprehensible. 

Artificial Life inspired research has shown how populations of agents can negotiate 
a language and how communication schemes can evolve. Generally, these attempt to 
negotiate a single language common to all agents. There exists a small body of 
Artificial Life based work which studies linguistic diversity. These have tended to 
ignore the simple hypothesis that linguistic diversity may be the non-adaptive result 
of spatially distributed learning. The model presented here investigates this. 



2 Dialect in Models of the Evolution of Language 

A small number of Artificial Life papers address the issues of language diversity and 
innovation directly. Maeda and Sasaki, [2], consider the effects of contact between 
different linguistic groups, demonstrating the subsequent language re-organisation. 

Kirby, [3], studies the evolution of Universal Grammar and shows, implicitly, how 
spatial organisation can affect the evolution of languages. Hashimoto and Ikegami, 
[4], show how pressure for greater expressive and parsing ability leads to increasingly 
complex grammars which are able to generate and interpret wider ranges of signals. 
Arita and Taylor [5] argue that spatial distribution can explain the origin of 
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linguistic diversity. Inheritance and mutation are important factors for generating 
diversity within this model, where learning reduces diversity. Yet in human language 
it is the cultural, not genetic, factors which determine which language is learned. In 
Arita and Koyama [6] an innate language is used to study the evolutionary dynamics 
of vocabulary sharing. One conclusion is that mutation rate is important in defining 
the class of vocabulary sharing. This is not surprising with a heritable vocabulary, but 
what this represents linguistically is not explained. Steels and Kaplan [7] demonstrate 
stochasticity, from errors in language use, as a source of change in language. The 
formation of multiple, distinct, dialects is not covered in this study in which the 
agents are not spatially organised. While Nettle and Dunbar [8] cite examples and use 
simulation to demonstrate that dialect has adaptive value in a social context, we will 
show that adaptive explanations are not required for evolution of linguistic diversity. 



3 A Model for Studying the Evolution of Dialect 

We adapt an earlier model [9,10] in which the emergence of dialects was observed but 
not studied, focussing now on the evolution of dialects. Genetic evolution of agents is 
removed, and learning modified to take place between successive generations of 
agents. Our model of language is a greatly simplified one, in which agents learn to 
map signals sent by others to internal states. These internal states could be said to 
relate to ‘meanings’ or to external observations. 

A language agent is modelled by a fully connected Artificial Neural Network, (ANN) 
which has two layers of nodes, with N inputs (the internal state, x) and M outputs (the 
signal, j»). The output layer has an additional bias node. 

The internal state is a sparse bipolar vector (only one node set to +1). This is fed 
forward to determine the agent’s signal for that state, each output being thresholded to 
a bipolar value, (1). Signals are arbitrary bipolar vectors. For interpretation of signals, 
competition is applied at the internal state layer, so any signal fed back from the 
language layer corresponds to only one internal state. For M output neurons, there are 
2*^ possible signals in the language, and for N input neurons there are N possible 
states. In experiments detailed here, M = N = 3. Agents have three possible internal 
states and eight possible signals, providing redundancy in the signalling ability. 

N-\ 

/=0 ' ^ 

then F- = 1 if J. >0, F- = —1 if V < 0 

j •' j j •' j 

Aw^=ri(x,.-x;)y^. (2) 

[11] suggests using transmission behaviour to train language reception and 
reception behaviour to train language production. Accordingly, the signal from a 
teacher is presented at the output layer of the learner and fed back to produce an 
internal state. The error between the teacher’s and the learner’s internal states is used 
for learning (2), performed when the learner misclassifies the signal. 
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The first generation of agents are initialised with random weight values and then 
provide training for one another for t training rounds, as in the following algorithm; 

For each agent (in random order) 

Pick another agent to be teacher 
Generate a training signal from teacher 
Train pupil on signal and state 

When each generation is created, the agents are bom with zero valued weights. The 
new agents learn first from agents in the parent generation for t training rounds, and 
then further negotiate language amongst themselves for tH rounds. 



4 Experimental Results 

Two sets of experiments were run. In the first set, agents inhabit a one-dimensional 
world. The second set was performed in a dimensionless world, in which every agent 
is equally likely to interact with every other agent. 



4.1 Evolution of Dialects in a 1-D World 

Agents are organised linearly and teacher agents are selected according to a normal 
distribution curve centred on the current learner agent. An agent may not be it’s own 
teacher. To visualise the languages, the three bits of a signal are used to set red, green 
and blue colour values of pixels. Plotting the pixels for each agent for each state, three 
columns are drawn each generation. Adding the columns for each generation to those 
of the previous, three bars are formed. A number of simulations were run, the results 
of one are shown in Fig. 1. The use of different signals can be seen to spread and 
recede over time. 




Fig. 1. The ‘evolution’ of signals representing three states in a spatially distributed population 
of 1 20 agents. Each bar shows the evolution of signal use for one of the three internal states 



A measure of the difference between signals can be found using the Hamming 
distance. With a small difference it is possible that both signals are interpreted the 
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same, due to redundancy in the signals. The Hamming distance between signals of 
neighbours, plotted over the whole population, can indicate language boundaries, and 
is shown for the final generation of the experiment above (Fig. 2 left). There is a high 
degree of change over the population. Despite this, signals used by adjacent agents for 
a particular meaning rarely vary by more than one bit. This could indicate a dialect 
continuum across the population. In a number of places, the signals used for two or all 
three of the meanings change together. These might indicate linguistic discontinuities. 




Fig. 2. Hie maximum (for any one signal) and total (over three signals) Hamming distances 
between signals used by adjacent agents in a spatially distributed population (left) and the 
percentage of communicative successes over the spatially distributed population (right) 

Fig. 2 (right) plots the communicative success of the agents. Despite the high 
amount of diversity in the signals used, most agents achieved perfect scores. The 
lowest scoring agent correctly interpreted 82 percent of signals, showing that a dialect 
continuum exists from one extreme end of the population to the other. 



4.2 Evolution of Dialects without Spatial Organisation 

In these experiments agents interact with each other with uniform probability. No 
distinct dialects are observed, and populations quickly negotiate a global language. 
Signal redundancy allows the languages to use two or more signals for each state, 
where each signal is associated with only one state (Fig. 3). Eventually even this 
diversity disappears, and a one-to-one signal-state mapping established. Using signal 
uncertainty as a measure, we have determined that there is significantly less diversity 
under these conditions (mean uncertainty of 0.75 bits). In contiast, where agents are 
spatially organised no significant diversity loss is observed or measured after 10,000 
or even 100,000 generations (mean uncertainties of 2.14 and 2.05 bits respectively). 




Fig. 3. The evolution of signals without spatial distribution 
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5 Conclusions 

While adaptive explanations of language variation show how dialect may be 
exploited, our model shows a linguistic system that maintains a high level of diversity 
without adaptive benefits. Inheritance and mutation are also absent in our model and 
no errors occur in information transmission, so these are also not required to maintain 
diversity. Stochasticity occurs in the random selection of agents and in choosing a 
state to signal, and is not completely removed. The role of stochasticity for linguistic 
innovation is not disputed, and in our model a population without diversity cannot 
develop it. There are qualitative similarities between spatial organisation of dialects in 
our model and the organisation of human dialects. The dialects in our model form a 
continuum, connecting dialects that are, at the extremes, not mutually intelligible. 

We conclude that sustained linguistic diversity can emerge from the interactions of 
agents, as a consequence of learning and the distribution of agents. Support for this 
comes from recent work [12] which uses evidence from global linguistic data, 
suggesting that linguistic diversity is related to ecological risk. Ecological factors 
determine the social networks formed by societies, which in turn determine linguistic 
diversity. This is the same as our conclusion - that linguistic diversity emerges as a 
result of constraints on interactions between language learners. We are currently 
working to determine the effects different parameters have on diversity. 
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Abstract. What permits some systems to evolve and adapt more effec- 
tively than others? Gell-Mann [3] has stressed the importance of “com- 
pression” for adaptive complex systems. Information about the environ- 
ment is not simply recorded as a look-up table, but is rather compressed 
in a theory or schema. Several conjectures are proposed; (I) compression 
aids in generalization; (II) compression occurs more easily in a “smooth” , 
as opposed to a “rugged”, string space; and (III) constraints from com- 
pression make it likely that natural languages evolve towards smooth 
string spaces. We have been examining the role of such compression for 
learning and evolution of formal languages by artificial agents. Our sys- 
tem does seem to conform generally to these expectations, but the trade- 
offs between compression and the errors that sometimes accompany it 
need careful consideration. 



1 Introduction 

Why are some systems more adaptable than others? A core feature of nearly 
all successful adaptive systems is the ability to distill experience into schemas, 
models or theories and then employ those abstracted structures in new circum- 
stances. Information about the environment is not simply recorded as a look-up 
table, with generalization to new situations happening only at look-up time. 
Rather, it is plausible that salient features of situations are noted, and then 
departures from expectations are noted. This is the essence of compression. 

Gell-Mann [3] has argued that a compressed form “is usually approximate, 
sometimes wrong, but it may be adaptive if it can make useful predictions in- 
cluding interpolation and extrapolation and sometimes generalize to situations 
very different from those previously encountered. In the presence of new infor- 
mation from the environment, the compressed schema unfolds to give predictions 
or behavior or both.” We would like to know more about the role of compression 
in adaptation. For example, can we identify features of compression that make 
some forms more or less likely to be successful? What sorts of compression are 
best? How much is desirable? 

This perspective on learning is not new, especially for cultural evolution. It 
is evident, for example, that there is frequently a practical necessity to simplify 
things so that we can understand them. Hence the ubiquitous use of simplified 
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models in science. Which models are themselves best, is also a matter of sim- 
plicity. William of Ockham, among others, has observed that “it is futile to do 
with more what can be done with less” [18]. In another domain, Chomsky [2] 
has placed the desirability of compressed, “minimal” grammar at the heart of 
his theory of human language. Nonetheless, there have been few attempts to 
explore, in a systematic way, how compression alone may affect adaptation. 

In this paper we first state some heuristics that we conjecture to be generally, 
if not always, true. We then introduce some elementary definitions and principles 
about data compression and formal languages. We have chosen to work with 
formal languages because it is possible to discuss them concretely and because 
such systems quite clearly show changes in their ability to generalize [14,6, 11, 
10]. We next discuss how compression can occur as an agent learns a language by 
hearing examples of it. We also describe some experiments where agents learn 
the languages with compressed grammars, and where the languages evolve as 
a result. Finally, we discuss some features of compression and evolution in our 
system in the light of our conjectures. We emphasize that while our discussion 
will be directed mostly to cultural evolution - scientific theory and language, we 
believe these heuristics apply to other adaptive complex systems, such as organic 
evolution, immune systems, and neural networks. 



2 Conjectures about compression and adaptation 

Conjecture 1. Compression aids in generalization. 

Prom a series of observations like “Crow A is black” and “Crow B is black” 
we compress a look-up table of crows and their colors to the generalization that 
“All crows are black.” The generalization is clearly smaller, more “compressed” 
than a list of many instances. The precise characterization of the circumstances 
in which such generalization is appropriate, the problem of induction, is a long- 
standing philosophical problem. 

The history of science is a history of finding generalizations that allow a 
succinct statement of the facts. Following the invention of the spectroscope, the 
spectral emissions from various elements, including hydrogen, were cataloged. 
About 1885 J. J. Balmer discovered formulae that would describe the frequencies 
for hydrogen emissions both compactly and accurately, though they were simply 
formulas without a model behind them. In 1913, Niels Bohr published a model 
for the atom that would compactly describe emissions from hydrogen, and several 
other atoms, in very compressed and desirable form. 

While the history of science can be regarded as a series of successively better 
compressions, it should also be recognized that the resulting compressions may 
make predictions that are only approximate or even wrong. Although models can 
give us insight into systems, the actual model used greatly affects the predictions 
that can be made and the types of behaviors that can be explained. Many 
scientific theories, no matter how well they might compress a set of observations, 
are subsequently proven wrong. 
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Conjecture 2. Compression occurs more easily in a “smooth”, as op- 
posed to “rugged”, string space. 

Related or connected sets of observations form a better basis for generaliza- 
tion than do similar or unrelated ones. We would like to be concrete about the 
meaning of “related or connected” in this context. Kauffman [8] has explored the 
use of adaptive landscapes in a variety of contexts, and we build on his example 
by exploring a string space of languages, below. 

Unrelated observations, like Lord Morton’s mare for Darwin’s theory of in- 
heritance, can make theory formation quite difficult [13]. As a result, it is thought 
generally better to focus initial scientific study on simple and well-defined sys- 
tems, where the smoother space is more easily explored, as Mendel did. 

Smoothness and ruggedness are, to some extent, a property of the substitu- 
tion operators. In molecular evolution for example, adding or deleting tandem 
repeats of 2 or 4 nucleotides is often easier than adding or deleting a single 
nucleotide [21]. 

Conjecture 3. Constraints from compression make it likely that natural 
languages come to have smooth string spaces. 

Language learners are regarded as systems that aim to identify rule systems 
that describe the (infinite) language of the community on the basis of finite 
evidence. This can only be successful in certain circumstances, whether one as- 
sumes that success is perfect identification in the limit (the “Gold” paradigm) [4], 
or that success is feasible convergence to arbitrarily good approximate identifi- 
cation [9,20]. Since humans clearly learn to speak natural languages that can 
generate an infinite number of sentences, and do so largely from examples and 
without conscious awareness of grammatical rules, most linguists believe that, 
roughly speaking, when a young person is confronted with a new sentence they 
will “try out” candidate grammars and then, from those grammars that can 
accommodate it, choose the one that is simplest [1,5,19]. 

Of course, human language learners hear ungrammatical sentences and sen- 
tence fragments, but this “noise” apparently does not make communication with 
the community impossible. We postulate that sufficiently rare complications are 
typically not incorporated; they will be catalogued as exceptions or simply ig- 
nored. As a result, there will be a natural selection for languages that are pro- 
gressively smoother. This is perhaps analogous to the way that biological systems 
sometimes “solve” other NP-complete problems. They seem to employ only those 
parts of the problem space that can be solved simply, even though the general 
problem might be insoluble [15]. 

3 Data compression 

Data can be compressed only when it contains some regularity that can be 
exploited. [17]. When there is a regularity, listing it repeatedly makes for an 
eliminable redundancy. 
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It is important to recognize that some schemes might provide very efficient 
coding of data, but would involve lengthy and complicated specifications of the 
decoder. The decoder could, for excunple, simply contain a list of any finite set 
of strings to be encoded. A better measure of compression would recognize both 
costs. The minimum description length algorithm (MDL) we use accomplishes 
this. In the formal language framework the sum of the grammar- encoding-length 
(the cost of the rule set) and the data-encoding-length (the cost of the coding of 
the data) is minimized [12,16]. 



4 Formal languages 

A (formal) language is a set of strings of symbols drawn from some alphabet. 
Here we shall be concerned only with regular languages, languages that can be 
accepted by a finite automaton. Such automata can be described by a quintuple, 
{Q, S, 6, qo, F), where Q is a set of states, E is an input alphabet, gro 6 Q is the 
initial state, hCQxSxQisa, transition function, and F C Q is the set of 
final states [7]. An automaton is said to be deterministic if the transition <5 is 
a function 6 : Q x E Q with respect to its first two arguments. Associated 
with each finite state automaton is a transition diagram, with an arc connecting 
states qi and q 2 labeled with vocabulary element a if, and only if, (gi , a, q^.) G 6. 
The automaton accepts a string s if, and only if, the string labels a path from 
the initial state to a final state. 

For purposes of illustration, consider languages 1-s and 1-r. Language 1-s 
consists of the strings aaaa, aaab, aaba, abaa and abba. Language 1-r con- 
sists of aaab, abaa, aaba, abbb, and bbab. Their respective “prefix tree” 
transition diagrams are shown in Figure 1. 



(a) 
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Fig. 1. Transition state diagrams for uncompressed grammars: (a) language 1-s (b) 
language 1-r. 



Since each string can be specified by a path through a deterministic automa- 
ton, we calculate one simple approximation of the data-encoding-length in bits 
given by the formula: 

m |si| 

Y. Y 

1=1 j=l 
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where m is the number of sentences in the sequence of strings encoded, |si| is 
the length of the I’th string Sj, and Zi^j is the number of ways to leave the state 
reached on the j’th symbol of sentence Sj. (A more succinct encoding is obviously 
possible when the probabilities of the transitions are not uniform. The simple 
approximation suffices for purposes of this preliminary investigation.) 

To specify the automaton itself, we must specify all the triples [q\,a, 52) € 5 
and we must also specify the final states, so we calculate the grammar encoding 
length: 

|J|[2(log2|g|)+log2|rl]+|F|[log2|Q|], 

where |<5| is the number of triples in 6 and |F| is the number of final states in F. 
The MDL of a language is defined as the sum of its grammar-encoding-length 
and its data-encoding-length. For language 1-s the MDL is 119.3 and for 1-r 
it is 168.0. 

We refer to language 1-s as “smooth” and language 1-r as “rugged.” The 
reasons for this are evident from Figure 2, which shows how the strings in each set 
are related by symbol substitution. (More generally, one might use deletion and 
duplication operators, as well as symbol-substitutions. Duplication and deletion 
are known to be important for the evolution of DNA and for natural language, 
but these operators add considerable complication for some of the comparisons 
we are interested in, so for this paper we shall restrict our attention to operators 
that are simple symbol substitutions.) 




Fig. 2. Hypercube representation of the landscape for strings of length 4. Nodes con- 
nected by lines are one symbol-substitution away from each other. Black dots are 
included in the language. Gray dots are not included, (a) Representation of language 
1-s (b) Representation of language 1-r. 



Each string in 1-s is one symbol-substitution away from its nearest neighbor. 
The symbol in only one position is changed from one string to the next, and 
the strings are very similar or regular, while in 1-r no string is closer than 
two substitutions away, making the strings not very related. Further, in the 
hypercube representation, there is a hyperplane in 1-s that provides regularity 
which might be utilized for compression, while such regularity is not evident 
in 1-r. This occurs in much the same way that the equation of a line offers a 
compressed representation of a set of data points in linear regression. These two 
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grammars also differ in their MDL measures: the MDL of 1-s is only 71% that 
of 1-r. 



5 Grammar compression 

Our model of language learning will be that of a child, who when listening to 
an adult speaking a language s/he does not yet understand, tries out different 
candidate grammars that might accept that language. The minimalist theory 
supposes that those grammars found suitable are then examined further, and 
the grammar with the shortest MDL is preferred and tentatively accepted as 
describing the parent’s language. 

In our study, an agent is exposed to all the legal sentences in the language 
- say of 1-s or 1-r. It constructs an automaton with associated grammar as 
shown in Figure 1, above. The agent then compresses the original grammar by 
attempting to combine states and transitions in such a way that the outcome 
is still a deterministic automaton and then combines them iff the MDL of the 
new automaton is smaller than that of the starting one. Given a machine A — 
(Q,S,6,qo,F), the result of merging states Qi,qj € Q is the machine A' which 
has these two states replaced by a new state qij as follows; 

= {{Q - qi,Qj) U {qij},D,d',q'o,F') 

where; qQ = qij if qo is either qi or qj, F' = {F - qi,qj) U {qij} if either qi or 
qj e F, and S' is the result of replacing all instances of both qt and qj by qij in all 
the triples {qn,a,qm) that define <5. This is applied recursively in a hill-climbing 
manner. 

The compressed transition diagrams and hypercubes for languages 1-s and 
1-r are shown in Figures 3 and 4. 




Fig. 3. (a)Transition diagram of the compressed grammar for language 1-s. (b) Hy- 
percube representation of the sentences included in the compressed grammar. 



The compressed 1-s is much smaller than that of the original {MDL = 56.8 
vs. 119.3), due to changes both in the number of states (11 to 5) and in the 
number of transitions (12 to 7). All of the accepted strings in this example are of 
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Fig. 4. (a)Transition diagram of the compressed grammar for language 1-r. (b) Hy- 
percube representation of the sentences included in the compressed grammar. 



the correct length, but the compressed version generalizes to include the entire 
outer cube of sentences in the hypercube. In “real life” such a generalization 
may or may not be desirable. 

The compressed version of language 1-r is also substantially smaller than 
the original {MDL = 92.6 vs. 168.0). Again this resulted both from a decrease 
in the number of states (15 to 7) and in the number of transitions (15 to 11). 
Compression did not lead to new planes or observable regularities, except that 
the one new sentence of length 4 which was added, bbba, was also a distance 2 
from its nearest neighbor. Here, however, generalization was such that sentence 
lengths other than 4 were allowable, ranging from lengths of 2 to infinity, so long 
as certain regularities, such as embedded tandem repeats of bb, are observed. 

6 Compression of languages 

Compression results from exploiting regularities in the data. We expect that 
languages with smooth string spaces contain more regularity to be exploited than 
languages with rugged string spaces. Therefore, we expect more compression 
from smooth languages. Is this the case? 

We created 40 languages with each of several degrees of smoothness. Each 
consisted of 10 strings, with lengths of 6 symbols drawn from the alphabet {a, 6}. 
Smoothness was varied as follows: distance-1, distance-2, distance-3 and random. 
All strings were 1, 2, 3 or random symbol-substitutions, respectively, from their 
nearest neighbor, drawn successively from a random starting string. 

These languages were presented to agents who created the finite state au- 
tomata grammars for them, then compressed the grammars using the algorithm 
described above. Compression was measured by MDL of original and compressed 
grammars. Error was measured by presenting the agents with all possible strings 
of length 6, then counting the number that were accepted but were not strings in 
the original language. Recall that the compression algorithm will always accept 
the original language, so error can also be viewed as the amount of generalization 
(to strings of length 6) that the grammar achieves; it says nothing about error 
in recognizing strings of other lengths. 
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As would be expected, smoother languages could be encoded more econom- 
ically than the rugged ones. The amount of compression that was achieved is 
measured by the compressed MDL which was: 126.6 for distance-1, 199.6 for 
distance-2, 174.8 for distance-3 and 181.4 for the random language. 

It is evident that compressed MDL was much less (i.e. more compression 
was possible) for the distance-1 languages than for the distance-2 ones. This 
was expected. But the grammars for distance-3 and random languages suddenly 
became more compressible. 

One possibility for the higher compression of distance-3 and random might 
be that they were unable to extract regularity and simply accepted more strings 
into the language. The mean numbers of errors was consistent with this expla- 
nation: 7.18 for distance-2, 29.37 for distance-3 and 27.42 for random languages. 
However, the mean number of errors for distauce-l was also large, 23.48, so at 
best the explanation is still unclear. It is possible that this will change if the 
agents are presented with more examples, so that the cost of encoding the data 
is significantly higher for distance-3 and random landscapes. We are attempting 
to understand this better. 



7 Evolution of language 

We explored the role of compression in the evolution of language in a simple 
way. Each language initially began with 10 strings of length 6, drawn from the 
alphabet {a, b} as above. There were 10 replicates of distance-1 languages and 
10 of distance-2. The parent in generation n then produced 10 sentences, at 
random from its language, and presented these to generation n -t- 1, who would 
generate the appropriate grammar. The generation n + \ would then compress 
that grammar and, with the compressed grammar, produce 10 more sentences 
for generation n -I- 2 and so on. This was continued for 10 generations. 

There were clear changes observed with all languages. Foremost among the 
changes were the numbers of strings in the language. While all began with an 
average 7.7 strings without duplicates, after 10 generations this had changed to 
an average of 11.8 for distance-1 and to 6.1 for distance-2. This is statistically 
significant at the .05 level by a paired t-test. Transmission from generation 0 
to 10 was lossy, with the average number of strings that were included in the 
original set of examples lost by generation 10 being 2.3 for distance-1 and 5.3 
for distance-2. 

The mean smallest distance between strings stayed about the same in both 
distance-1 (from 1.0 to 1.18) and distance-2 (from 2.0 to 1.72). We had expected 
that smoothness would increase with time, but this apparently was not the case 
since there were not significant changes. An increase must be expected from 
the distance-1 languages, because they began at the lowest value possible, and 
there is nowhere else to go but up. The mean distance did increase slightly. For 
the distance-2 languages, however, the mean smallest distance decreased only 
slightly. 
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Somewhat paradoxically, the mean MDL for both grammars decreased, in 
spite of their lack of change in ruggedness. Figure 5 illustrates the change in 
MDL which occurred, showing a large difference in the early generations, with 
only modest change later. 




Fig. 5. Mean Minimum Description Lengths for evolving languages of symbol substi- 
tution distances 1 and 2 for generations 0 to 10. 



It is clear that while the complexity of the grammar was decreased in both, 
the distance-2 grammars remained more complex, though they admitted a much 
smaller number of sentences. 

8 Discussion 

We have examined several conjectures about compression in adaptive complex 
systems. We employed agents that could learn and evolve one class of formal 
languages, using one compression algorithm. This learning and evolution was 
based on syntax alone, without any reference to semantics, which is likely to be 
important [10]. Our results, while so limited, do seem illuminating. 

Conjecture 1: Compression aids in generalization This was examined with 
systems that learned and compressed languages of varied ruggedness. In all cases 
we observed a compression of grammar, and in nearly all cases the compressed 
grammars generalized to admit new strings into the language. 

It should be recognized that such generalization may be desirable, because 
it reduces the complexity of rules, but it can also admit mistakes. To date we 
have made only a cursory study of the tradeoff between MDL and error in our 
system, but there clearly are important issues that will warrant further study. 

Conjecture 2: Compression occurs more easily in a "smooth”, as opposed to 
"rugged”, string space We explored compression in systems where strings were 
1-, 2-, or 3- symbol substitutions apart from their nearest neighbor or were ran- 
domly placed in the sentence space. We observed that even the uncompressed 
grammars were smaller for the smooth than for the rugged languages. Com- 
pression, as measured by compressed MDL, was clearly greater in in languages 
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where sentences were 2 symbol substitutions apart than if they were only 1 
substitution apart. We interpret this to mean that patterns can more easily be 
identified and exploited by the compression algorithm in the smoother language. 
However, where smoothness is still less, in the distance-3 and random languages, 
the compression is also greater than for distance-2 languages. The reason(s) for 
this remain unclear, but there is a correlation between compression ratio, error 
rate, and number of examples presented that is important here and needs further 
exploration. 

Conjecture 3: Constraints from compression make it likely that natural lan- 
guages come to have smooth string spaces 

For the purposes of this paper we accept as true the theory that when learning 
language we (a) compress grammar with simple rules and (b), all else being equal, 
apply these compressed rules preferentially to new situations. The agents in our 
study automatically created grammars which admitted all legal sentences. These 
grammars eliminated some redundancy, but retained logical equivalence. They 
were already programmed, quite literally, to conform to this theory. They were 
also programmed to compress those grammars, when so directed. 

Here it is important to distinguish carefully between the smoothness of a 
language and the complexity of the grammar needed to describe it. While related, 
they are not the same. Both distance-1 and distance-2 languages evolved into 
languages with sentences that were, on average, separated by about the same 
number of symbol substitutions as when they started. That is to say, they did 
not become smoother in the sense of becoming more connected or to consist of 
strings lying together on the same hyperplane. At the same time, the grammars 
describing them came to have smaller MDLs. That is, both languages came to be 
described by simpler grammars. When we generated languages for compression, 
we observed that the smoother languages did have, on average, smaller MDLs, 
but not invariably so. Clearly there is some subtlety. The suggestion here is that 
grammatical complexity does, indeed, become simpler, but that this is not the 
same as saying that the languages become smooth, as smoothness is used in this 
paper. 

In his study of evolving bit strings, albeit with semantic content, Kirby [10] 
observed two phase transitions in grammatical structure. The first of these oc- 
curred when the languages became suddenly more expressive, with a concomitant 
increase in grammatical complexity. The second phase change occurred after a 
high degree of expressivity was achieved, then the grammatical complexity sud- 
denly started to become much less. That study, and ours, are in agreement that 
even in such simple cases as evolving bit strings there are unlikely to be simple 
rules about changes in smoothness of languages, or grammatical complexity - 
though in the long run both studies did result in simpler grammars after suffi- 
cient time. 

In summary, we observed broad agreement with the conjectures made prior 
to the start of the study. It is evident, however, that even in our simple system 
there is significant subtlety that must be recognized in the inductive tradeoff of 
generalization through compression versus error of overgeneralization. 
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Effective Lexicon Change 
in the Absence of Population Flux 
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Abstract. Multiple agents, equipped with a feature-based phonetic model and a 
connectionist cognitive model, interact via the naming game paradigm, such that lexicon 
formation and change is an emergent property of this complex adaptive system. Our 
system converges on a coherent lexicon and effective language change is demonstrated, 
even in the absence of a changing population, which brings into question claims made in 
earlier work. We argue that our phonetic and cognitive models tend towards a cognitive 
validity which was absent from previous work in this area, while maintaining the flexibility 
of other systems. 

1 Introduction 

The pace of technological advance and invention in today’s world forces the aware 
individual into intimate contact with language change, especially in the form of lexical 
creation. It is certainly possible to examine word-formation and word-adaptation 
processes which occur in response to a terminological void introduced by the creation 
of a new object or the discovery of a new idea; however, until recently practical 
investigations into longer-term linguistic change have been severely curtailed by the lack 
of appropriate experimental subjects. Where historical sources exist, some theoretical 
conclusions have been drawn, but written records often do not accurately capture 
phonological processes, and in any case, testing the predictions of such theories might 
take years, or even generations - assuming, of course, that the design of a controlled 
experiment is possible in the first place. 

Recently, members of the Sony CSL Paris have begun to examine language change 
as the emergent behaviour of a complex adaptive system. Such a view of linguistic 
variety enables researchers to apply artificial life and multi-agent techniques to this 
problem, thus circumventing the difficulties associated with prediction testing by moving 
the domain of experimentation to the main memory of a computer. In a recent paper 
produced by this group ([8]), four factors are postulated as necessary for “effective 
lexical change” to occur: self-organization, stochasticity, agents tolerant of variation, 
and a changing population. 

We perform our investigations into the nature of lexical change within the framework 
of a multi-agent simulation using the naming game (first introduced in [6]) as a paradigm 
for agent interaction; however, we have augmented our implementation with a more 
realistic phonetic model than is present in [8], the starting point for our work. The 
behavioural mechanisms of our agents are based on an internal connectionist model. 

2 The Naming Game 

Research involving multiple software agents requires an unambiguous specification of 
the precise details of all possible agent-agent and agent-environment interactions. To 
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this end, [6] introduces the naming game, an austere paradigm for interaction tailored 
especially for the development, transmission, and evolution of a lexicon. 

The naming game is appropriate for any simulation with a fixed number of agents 
and objects; an interaction proceeds as follows: (1) Two agents are selected at random 
out of the population; one is designated as the Speaker, the other as the Hearer. (2) The 
Speaker chooses an object at random and names the object; that is, accesses an internal 
representation and produces a linguistic form. (3) The Speaker (virtually) points at the 
object. (4) The Hearer interprets this combination of linguistic and extra-linguistic 
information produced by the speaker to be a reference to a particular object. 

The game succeeds if the Hearer correctly interprets the information provided by the 
Speaker; if the two agents do not agree on the object referenced, then the game fails. 

Presumably, upon completion of a naming game interaction, learning occurs; while 
the bulk of research uses the adaptive rules outlined in [6], there is no reason to suppose 
that this framework must be coupled with precisely those learning rules. We 
demonstrate that the naming game provides an excellent paradigm for use with agents 
possessing different internal mechanisms and learning procedures. 

3 The Phonetic Model 

We enhanced the original simulation by replacing the random generation of words from 
character sets which characterizes [8] with a binary feature representation where agents’ 
utterances are consonant-vowel pairs, a somewhat more realistic phonetic model. 

Chomsky-Halle features and the cardinal vowel system provide us with a loose basis 
for our model. There are four consonant features (anterior, coronal, voiced continuant) 
and three vowel features (closed, mid, back), allowing 16 possible consonants and 8 
possible vowels, giving our agents 128 potential abstract CV combinations. 

Rather than purely binary values, we allow real-valued features across the interval 
[0,1], which changes the domain of the utterances of our Speaker agents from idealized 
phonemes to a naive abstraction of actual acoustic productions. The cardinal vowel 
system specifies infinitely variable tongue positions relative to standard points, and 
voicing delays and tongue positions for consonants also vary across speakers. In real 
world communication, the hearer takes this fuzzy encoding and interprets it as abstract 
phonemes; this is a process which our agents model. 

4 Agent-Internal Connectionist Architecture 

Each agent has two completely separate neural nets; one outputs an utterance given 
precise object information (the S-Net: e.g. SpeechNet), and one (the H-Net: e.g. 
HearNet) identifies aparticular object given an utterance and appropriate extra-linguistic 
information. 

The S-Net is a two-layer, fully connected feed-forward network with an input node 
corresponding to each object, and an output node for each phonetic feature. After 
choosing an object as the topic of the conversation, input node activations are initialized 
to 0 except for the topic node, which is set to 1.0. Connection weights between the 
layers range over the interval [0,1], and since only one input node is on, the values of the 
appropriate weights translate directly as output layer activation levels. These activations 
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can be interpreted as the value of the corresponding phonetic feature in the utterance. 

The H-Net interprets the combination of non-linguistic cues and phonetic 
information from the S-Net, ultimately choosing an object which (hopefully) matches the 
original topic. There are two sets of input nodes, corresponding to phonetic features and 
object information. The activation levels from the output nodes of the S-Net are 
transferred directly to the phonetic nodes, while the object nodes are set up to represent 
the extra linguistic ‘pointing’ information from the Speaker. We model the uncertainty 
inherent in physical indications of location by probabilistically ‘smearing’ the virtual 
pointing information. Five of the ten objects are given a weight, thus modelling the 
viewer’s confusion about the target, which could arise because of visual similarity 
between the objects, their spatial proximity, or other, more complicated issues. The 
combination of this imperfect physical information and the phonetic communication 
allows the H-Net to decide upon a topic. 

Our simulation achieves an 86% success rate using phonetic information only, but 
this jumps to 95% when non-linguistic cues are also processed. Object information 
alone, however, proves too fuzzy, as only a 50% success rate is achieved on this basis. 

Following input layer initialization, activation levels are multiplied by the appropriate 
weights and a sum of products calculation is performed for each output node. 
Competitive inhibitory links between objects are then implemented as a winner-take-all 
model; the output node with the highest score is selected as the Hearer’s interpretation 
of the Speaker’s actions. 

5 The Training Regimen 

Learning only occurs when the naming game fails; individuals are unlikely to adapt their 
language model when communication succeeds. When a topic misunderstanding occurs 
between humans, the two participants do not simply leave; rather, the speaker is likely 
to repeat the word, or even touch the object to ensure successful communication. Thus 
it is reasonable to suppose that our Hearer agents, upon failure of the game, would have 
accurate information about both the Speaker’s utterance and the intended topic. If 
communication fails, the Hearer adapts such that in the future its speech will more 
closely resemble the Speaker’s and it will be more accepting of that Speaker’s accent. 

Weights in the S-Net are randomly initialized to values between 0 and 1 , providing 
our Speaker with a random initial language. The phonetic weights in the H-Net are 
initialized on the interval [0,1], but represent the phoneme’s contribution to an object’s 
score, rather than explicit phoneme values. H-Net weights between input and output 
object nodes are not trained. Diagonal weights are given uniformly high values, other 
weights are given random lower values. This approach simplistically models similarities 
between objects. 

During training each feature of the Hearer’s S-Net is examined independently to 
determine if it would binarize to the same value as the corresponding feature in the 
Speaker’s utterance. If so, its value is reinforced; if not, it is punished. The punishment 
equation moves values towards 0.5, while the reinforcement function tends towards 1 or 
0, depending on the weight’s polarity. Random fuzz of 5% is also required, otherwise 
the punishment function will never move a weight across the fixed point of 0.5. 

The only trained weights in the H-Net are those between the phonetic input nodes and 
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the output nodes (object weights remain fixed at their initial values). Again, training 
occurs only when the Hearer has chosen the wrong object; this weight modification 
makes the H-Net more likely to settle on the topic given the same phonetic input. 

Accordingly, we decrease the score of the false positive, and increase the score of the 
correct answer, a straightforward procedure which does not complicate the dynamics of 
the network, and gives weight values in a reasonable range without explicit constraints. 
Furthermore, there is a certain level of cognitive justification, since humans are more 
likely to take note of errors and correct responses than to try to correct near-mistakes. 

6 The Simulation World 

There are 20 agents and 10 objects in our world, numbers chosen to coincide with [8]. 
However, our simulation successfully scales to at least 50 agents and 20 objects. 

There is no attempt to make the agents internally consistent at ‘birth’, and we never 
explicitly test for internal consistency - while this could result in the unfortunate 
situation where an agent cannot understand itself, in practice this does not occur. 

In experimental runs, agents are selected to speak in order, with a randomly chosen 
Hearer. In simulation runs with population flux, every 2000 games a random individual 
is removed from the population and a new, randomly initialized agent takes its place. 
Since simulation runs consist of either 100,000 or 400,000 instances of the naming 
game, either 50 or 200 new individuals are created during a run with population change. 

7 Experimental Results and Discussion 

In our long-term trials, we achieved success rates around 95% when the population was 
stable, and around 80% with a dynamic population. We ran 9 short and 6 long 
simulations with a static population, and 9 short and 7 long simulations with population 
flux. As can be seen from Table 1, we found lexical change in both the dynamic and 
static populations. Lexical form change is defined as a single form which is spoken by 
60% or more of the population for a significant period of time (at least 10% of the 
simulation length), which is then replaced by a second form which comes to dominance. 

It is the final claim in [8] that four 
factors are necessary for effective 
lexicon change to occur. It is difficult 
to see how one could prove, through 
simulation only, that these four factors 
form necessary as well as sufficient 
conditions for lexical change; indeed, 
our investigations disprove the fourth 
factor, showing that while lexical change is much more frequent when population change 
is introduced, it occurs even in a static population. 

One might argue that our definition of language change is too weak; however, a 
typical example has a form spoken by 65% of the population for 40,000 instances of the 
naming game before it is replaced. By contrast, it often takes as little as 5,000 instances 
to achieve stability in cases where a single form dominates with >90% of the speakers. 

Even a single instance of change refutes the claim that population flux is a necessary 



Table 1; Percentage Form Change 



Form 

Change 


Static 


Dynamic 


Short 


Long 


Short 


Long 


6% 


10% 


13% 


47% 
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condition for lexicon change; we recorded lexicon change in 4 of the 9 short interaction 
runs. This means that in almost 50% of the runs with static populations, at least one 
form underwent effective lexicon change. Furthermore, the incidence of lexical change 
is higher in the longer runs, which indicates that it is not an artifact of the initialization 
of the system. 

Unfortunately, ‘effective’ change has not yet been defined. Perhaps a form switch 
after a period of 60% dominance is not ‘effective’ change. We do not observe forms 
which are completely (e.g. >90%) dominant falling into disuse, but our results are from 
only 15 simulation runs; perhaps in 100 or 1000 runs this might occur. Our results are 
indicative that even given this strong definition, ‘effective’ language change can occur 
in a static population, and we have amply demonstrated the extinction of forms spoken 
by a majority of speakers (13% of all forms in our longer simulations). 

9 Conclusions and Future Work 

We have shown that the naming game is an effective paradigm even when completely 
divorced from its original implementation, and that simitar results to those reported in 
[7] and [8] can be obtained using a connectionist architecture for the agents’ cognitive 
model, and with a phonetic model which is more linguistically accurate. These results 
and other recent work indicate that modelling language as the emergent behaviour of a 
complex adaptive system can be a valuable tool for linguistic investigation. 

We would like to model physical constraints of the vocal tract so as to have the 
agents produce more realistic sound combinations, especially with an expanded feature 
set, allowing a larger number of phonemes. We hope to introduce an object model in 
which objects are represented by feature vectors rather than simply atomic nodes, to see 
if hierarchical concepts might be instantiated as lexical items under these conditions. 

The occurrence of lexical change in a static population, which we’ve observed 
contrary to the expectations of [8], merits further investigation. [8] grafts stochastic 
measures onto a deterministic backbone, and uses a rigid, orthogonal phonetic model, 
training agents on complete words. We use an inherently stochastic connectionist 
approach, and our real-valued phonetic model increases flexibility. In fact, our H-nets 
are trained to accept a schema (in the sense of [4]), and the S-Net of the Hearer is trained 
on the piecewise error. We suspect that these modifications allow our simulation to be 
permissive enough of variation so as not to require the external prodding via population 
flux which is necessary to produce lexical change in [8]. 
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Abstract. The naming game is a formal mechanism that describes the 
development of a lexicon in a society of culturally interacting agents. 
We will here use a cellular automaton version of this game to study 
the influence of an extra-linguistic structure over the evolution of the 
lexicon, but also the influence of language over this a priori structure. 
This extra-linguistic structure will be coded by first giving a location in 
a 2-D world to agents, and then by allowing them to move in relation 
to the outcome of the naming games. The results we will present show 
strong self-organization phenomena, such as the appearance of language 
and geographical clusters, in addition to the basic properties of the game 
(high communication success). 



1 Introduction 

Language as a complex dynamical system has been increasingly studied in the 
last decade. Formal models have been built to investigate the question of the 
emergence of language through self-organisation in a society of culturally inter- 
acting agents. This work is practically done both in simulations and by using 
robotic agents ([!]). One of the questions raised is ; how can a lexicon shared 
by many agents emerge ? How can initially random form-meaning associations 
self-organize through cultural evolution ? The naming game is a formal mech- 
anism that has thus been introduced to tackle this question ([3]), and will be 
described in section 2. Many results have already been worked out, but most of 
them concentrated on the language itself and did not consider the possibility of 
having a priori extra-linguistic structure in the agent society. The first step was 
done in [2] where the idea of spatialization was introduced : the agent society 
is given a fixed topological structure (that can be based on a geographical or 
social distance for instance) that determines with who agents may speak, as we 
will see in section 2. We will present new results in section 3, based on a purely 
cellular automaton model of the naming game We will show that this topological 
structure has an important impact over the lexicon development, and that emer- 
gent phenomena such as the appearance of what we will call language clusters. 
Then, in section 4, we will generalize this model by having a variable topological 
structure in which agents will move according to the outcome of the naming 
game they are involved in ; the language modifies the structure. 
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2 The Naming Game : Formal Model 

Let us first define what is a lexicon L : it is a set of associations form meaning 
L = Si)} where a form f, is a symbol, a meaning m* denotes a cat- 

egory/object, and Si is the score of the association, constrained by Sj 6 [0,1]. 
Two operations can be performed over a score : increase by SC or decrease by 
SC. Because of the constraint Sj G [0, 1], when the result of an operation is 
greater than 1, it is mapped to 1, and when it is smaller than 0, it is mapped 
to 0. Moreover, both forms and meanings can appear several times in different 
associations. 

An agent a, is here a t-uple at = {{xi,yi),Li,Wa,iVc,IS, SC) where (xi,yi) is 
its location in a 2-dimentional world, Li is its current lexicon which is initially 
empty, Wa the probability of accepting a new association (fi,mi, Si) when needed 
as explained below, Wc the probability of creating a new association {fi,mi,Si) 
when needed, IS the initial score of accepted and created associations, and SC 
the quantity by which a score can be changed (increased or decreased) when 
needed. 

A society of agents is defined as the 3-uple Soe = {{ai},loc,{mi}) where {oj} 
is the set of agents, loc the locality used as explained hereafter, and {mi} is the 
set of meanings/objects that agents will have to name by associating them one 
or several forms. 

A round of the naming game consists in picking up randomly one agent, called 
the speaker s, who himself chooses randomly one of its loc neighbors (the hearer 
h), according to the euclidean distance. Then the speaker chooses randomly a 
topic m € {nii}. Let E = {assoc* = (/A,mfc,s*)|(m* = m) A (ossoc* 6 Lg)} : 

— if S = 0, then this means that the speaker has no form for m, and then 
he creates (with probability Wc) randomly a new form / with probability 
iCc and add {f,m,IS) to Lg. The round ends and the result is said to be 
FAILURE. 

— if ^ 0, then s chooses randomly (/,m, s) G E which is an association in 
E with a maximal score, and will be called the last preferred form for m for 
agent a*, or just preferred form for m. Then, s points to m with an extra- 
linguistic tool (the hearer always understand what is the topic here), and 
utters /. Let us now consider the predicate P = (3s* |(/, m, s*) G L*)A(s* > 
0 ) : 

• if P is false, then this means that the hearer does not understand and the 
round ends with the result FAILURE. The speaker decrease the score of 
its association involving / and m. The association {(/, m,/C')} is added 
to Lh with the probability Wa- 

• if P is true, then this means that the hearer does understand, and the 
round ends with result SUCCESS. The speaker increase the score of 
its association involving both / and m, and decrease all associations 
involving either / or m. 
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3 Experiments with agents that have a fixed position 

The first measure we are going to use is the evolution of SUCCESS/FAILURE 
with the number of rounds. The second one is what we will call evolution 
of form-spread for a given meaning m, along with the number of rounds. A 
form-spread for m in round r is defined as FSm,r = {(/i)Pi)|(3« S Soc\fi = 
last preferred word of a for m) A {pi = percentage of agents that have fi 
as their preferred form for m)}. The form-spread graphs for m will thus be com- 
posed of maXr\FSm,r\ curves, each one corresponding to one ft € FSm,r and 
representing the evolution of the associated pi in function of the number of 
rounds. This measure allows to follow the evolution of a given preferred form for 
a given meaning along with the number of rounds and within the population of 
agents. Other measures will be used and presented when needed. 

The experiments in this section dead with agents whose location does not 
change with time. All agents have the same parameters Wa, Wc, IS and SC. 
When parameters are not precised, it means that they have default value : 100 
agents randomly located in 2-D space, 4 meanings, loc = 8, Wa = 0.1, Wc = 0.1, 
IS = 0.1 and SC = 0.1. 

The first results yielded by those experiments are that like in the non spatialized 
naming game (NSNG) ([2]), success reaches more than 85 percent after 1500 
games, and this at the same speed. But unlike NSNG, where one form rapidly 
dominates for a given meaning, several stable forms appear here and go to an 
equilibrium where each one keeps its percentage of users : Figure 1 shows form- 
spread for one of the meanings. This is true for nearly all parameters, except 
when 75 = 1 where only one form eventually dominates. 

m, 1 1 




Figure 1 : evolution of form-spread for meaning 1 with 100 agents 
Moreover, language clusters for a given meaning m appear : users of a given 
preferred form for m are geographically neighbors. Moreover, experiments re- 
peated 100 times over 300 randomly located agents yielded the following results 
about these clusters : 

- for a given locality, the number of clusters is very stable ; 

- the number of these clusters at convergence state along with locality follows 
a power law ; 

- the occurence of a given size of cluster for a fixed locality follows also a power 
law (in function of this size). 

An interesting phenomenon appears when we introduce agent flux : every 
n rounds, one agent is randomly choosen and removed while a fresh one (with 
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empty lexicon) is added in a random place. Furthermore, flux is here intro- 
duced when the equilibrium has been reached. A low flux (every 30 games in 
a population of 25 agents) produces no change in communication success, but 
destabilizes language clusters and surprisingly one preferred form quickly domi- 
nates for a given meaning (see Figure 5) , in the same way as in non-spatialized 
naming game : instead of bringing diversity, agent flux brings unity to the lan- 
guage. Medium flux (every 15 games) shows the same phenomenon, but now 
success falls to about 50 percent. Finally and logically, when flux is high (every 
10 games), language collapses : success falls to 30 percent and no form manages 
to install itself. 




Figure 5 : form-spread for meaning 1 with 25 agents 
and flux (n = 30) introduced after 1400 rounds 



4 When Agents Move 

Now that we have studied basic properties of the impact of a structure in the 
society over the development of a lexicon, we will investigate the case in which 
agents have no more a fixed location but instead move according to the outcome 
of the rounds ; topology and language are bi-directionnaly coupled. Let us first 
precise how this coupling is defined. At the end of each round, if the result is 
a SUCCESS, then the speaker moves towards the hearer in the following way ; 
{Xs,ya) => (a:* -l-C* {xh - x,),ys + C *{yh-yi)) where {xs^y,) is the position 
of the speaker before he moves, {xh,yh) is the position of the hearer, and C 
a constant typically equal to ^here x(oj) stands for the 

abscisse of agent Oj in the first round ; this does that agent displacements are 
very smooth. When the result of a round is a FAILURE, the speaker performs 
the same displacement but with —C instead of C. Last, in all experiments agents 
have a random position in the first round. 

Let us now come to what we observe. First, the evolution of success is the same 
as when agents did not move, contrarily to the a priori idea that it would grow 
faster because agents try to avoid fadlure and foster success : the reason is be- 
cause meanings interact through space. On the other hand, we also reach a 
similar stable-state in form-spread for a given meaning but the difference is that 
the number of stable forms is here much smaller and grows very slowly with lo- 
cality and size of population, and that these stable forms are not frozen for ever : 
small changes appear, explained by the bumping of clusters as explained below. 
The most interesting result comes if we now look at the evolution of topology 
in relation to these stable forms for a given meaning. Indeed, geographical clus- 
ters (defined only by the location of agents) quickly emerge as shown on Figure 6. 




729 



-•» 



- » 

Figure 6 

Moreover, most of the time, these geographical clusters are also language 
clusters for every meaning. This means that in the same geographical cluster, 
most of agents share the same preferred word for each meaning. Because agents 
do not stop moving even when they are in a cluster with other agents having 
the same preferred forms, clusters also move, and this is done in a random walk 
manner. The consequence is that sometimes two clusters simply bump into each 
other. Two behaviors may appear : either they bounce and continue indepen- 
dantly their walk, or they melt. The possibility of melting is given when these 
two clusters have a common form for one or more meanings : this constitutes 
an attraction force that can hold them together. When they melt, in the case 
where IS is close to 1 , the resulting cluster has a very interesting property : one 
form soon dominates for each meaning. The important thing is that the origin 
of these winning forms (one cluster or the other) is random. Thus we observe 
a real fusion of languages : for instance, cluster c\ could have form f\ for mi 
and f-i for m2, cluster C2 have form fy for mi and for m2, and the resulting 
cluster Cl + C2 can happen to have form f\ for mi and form /2 for m2. 

5 Conclusion 

We have shown that introducing extra-linguistic structure in a society of agents 
that use the mechanism of the naming game leads to the emergence of language 
clusters. Moreover, by coupling the outcome of the rounds to the displacements 
of agents, we saw that language itself shapes the extra-linguistic structure. 
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Abstract. Categorization dynamics as the clustering of words in word 
relation is studied by a constructive approach which is suited to inquire 
evolutionary linguistics with dynamical view on language. Word meaning 
is represented by relationship among words. Tthe relationship should 
be derived from usage of language. Being founded on this usage-based 
view, we define an algorithm to evaluate word relationship. Using the 
algorithm, cluster structure and its dynamics of words are shown in a 
model with communicating artificial agents. The relevance of clustering 
with linguistic categorization is discussed. 



1 Dynamical View on Language 

There are two ways of viewing language: structurally and dynamically. The 
structural view is a static one in which language structure, for example, syn- 
tax, dictionaries, or pragmatic rules, offers idealized approaches to language. 
The alternative view is dynamic. It concentrates on the actual use of language 
rather than abstract notions of how language ought to be. It is possible to better 
understand the value of the second approach by thinking of metaphor. Metaphor- 
icalexpressions are creative and dynamic precisely because they can “bend” or 
“break” the rules of conventionally structured language. By producing or un- 
derstanding metaphorical expressions, especially creative or unique metaphors, 
our internal models should change. We can not say valid or not valid for such 
creative expressions, since the expressions are so novel that it is not valid for 
a conventional language structure. We should consider whether the expressions 
are to be accepted or not. If we accept them, our internal structure changes and 
language structure might also come to be modified. In the dynamical view, the 
whole system of such dynamical processes is considered as ’language.’ 

Constructive approaches are highly advantageous to understanding complex 
systems [4]. These approaches are also useful for studying evolutionary linguis- 
tics [7] . In contrast to conventional linguistics which attempts to describe various 
language phenomena, in the constructive approach the emergence of global or- 
der as language-like behavior is modeled through interaction among individuals. 
However, not only emergence but also the dynamics of global order should be ob- 
served in constructive models, since language is indeed an ever-changing system. 
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Perhaps the internal dynamics of individuals should be taken into considera- 
tion to study evolutionary language system so as for individuals to change their 
internal states and relationship to others and circumstances. 



2 Modeling — Word Relationship and Conversation 

We have proposed usage-based viewpoint on meaning [2] which have claimed 
that meanings of words should be discussed in terms of how language is used [9] . 
Interrelationship among words can be employed as a representation of meanings 
of words to some extent. This point of view implies that relationship of one word 
to other words should be derived from analyzing the usage of the word in the 
language, not entirely from its indication or reference. Moreover, a word in a 
sentence is understood from not only relation with only entities mentioned by 
the sentence but position in the whole system of language. 

Based on this viewpoint, we discuss dynamics of categorization by observing 
how the relationship among words changes through conversations. Building rela- 
tionship in use of language is a dynamical process performed by language users. 
We call this process sense-making process [1] to emphasize its subjective nature. 
The sense-making process is modeled by positioning a word in the relationship 
among all words. 

The algorithm to evaluate relationship between words is basically attributed 
to Karov and Edelman’s work [5] with two revisions. The one is to calculate 
relationship dynamically in the course of conversation, since what interests us 
is not in the final form of category but in the dynamics of categorization. The 
other is to consider ’texts’ on higher level than sentences^ . A text is a stream 
of sentences uttered and accepted. The relationship between words is defined by 
the linear combination of the terms of usage-similarity and appearance-similarity 
using a coefficient as^ 

R{wi,Wj) = a"’ (usage-similarity) -I- (1 — a’") (appearance-similarity) . 

The first term is designed to calculate usage similarity of words in sentences 
by considering the syntagmatic relationships between words, i.e. words used in 
a sentence are in strong relationship. Since this algorithm is applied iteratively 
for each sentence in texts, words used not in a sentence but at the same position 
in different sentences grow their relationship. In other words, this algorithm is 
able to capture the paradigmatic relationship from the syntagmatic one. 

The second term seizes the similarity among patterns of appearance of words 
in texts. Words with resemblant patterns of appearance among texts, e.g., words 
used much often in particular texts but not so in other texts, raise their rela- 
tionship. Conversely, words with different patterns of appearance weaken their 
relationship. This is realized by calculating the correlation of appearance in texts. 

^ A text is a set of sentences. In our paper, this is applied to a conversation. 

^ As space is limited, for the detail of the algorithm, see [3]. 
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We model a simple conversation process between agents having word relation 
matrices as their internal structures. Here, we focus on dynamical changes of 
internal structures of agents through exchanging sentences, the simplest act of 
using language. A conversation between agents starts with uttering a sentence 
about a topic displayed to the agents. After the beginning of the conversation, 
each sentence is not restricted to the topic but there is some relevance with the 
previous sentence. In this model, to express this relevance, at least one word in 
the accepted sentence should be used in a reply sentence. 

The procedure of conversation in a text is as follows: 

1. A speaker agent produces a sentence about a topic. 

2. The sentence is modified according to the creativity rate, c, and then uttered 
to a hearer agent. 

3. The speaker’s word relation matrix is updated in terms of the uttered sentence. 

4. The hearer accepts the uttered sentence if there are less than two unknown 
words in the sentence^. If the sentence is not accepted, the speaker turns to 
another topic, (go to 1.) 

5. The hearer’s word relation matrix is updated in terms of the accepted sentence. 
If there is an unknown word, the matrix is expanded to incorporate the new word. 

6. To reply to the utterance, the role of speaker and hearer are exchanged between 
them, (go to 1.) 

When the number of accepted sentences or that of rejected sentences in a 
text reach some values, the text ends up. Then another pair of agents and a 
topic are selected for a new text. 

3 Summary of Simulation Results 

In one conversation, two agents from five are randomly selected as a speaker and 
a hearer. Sentences are produced artificially by agents by arranging words in 
which 5 different characters are combined^. The number of topics to be displayed 
to agents is 10. The maximum of accepted sentences in a text is 100 and that of 
rejected sentences is 5. The parameters are a'" = 0.4, c = 0.1. 

The followings are the major results: 

1. Agents develop cluster structure in their own word relation matrices. We 
observe two characteristic types of clusters. One is flat type in which words have 
strong relationship with each other. The other is gradual type in which word 
relationships change gradually. As a result of development, these two types of 
clusters exist in combination. 

2. Relationship among words drastically changes when a new word is used 
or a word is used in an unusual way. For example, at the 21st text in Fig. 1(a) 
most words with strong relation with a word in new usage weaken their relation 
value and vice versa. 

® Note that the criterion for acceptance of uttered sentence by the hearer lays down 
the limitation of ability to make sense for new words. 

^ The number of words and that of sentences are in principle infinity. 
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3. The structure of clusters has stability and adaptability. The change of 
position of words in cluster structure is examplified in Fig.l(b). The words in a 
new usage, linked with a dashed arrow, moves its belonging cluster. The other 
words move so coherently in each cluster that the whole structure of clusters is 
not modified very much. 

4. Structure common to agents develops in the course of conversations. 

5. Agents also develop structure peculiar to individuals, because they go 
through diversified experiences of conversations. 




-020 -O10 0 90 0.19 0 20 

first principal wmpofrient 



Fig. 1. (a) 'IVansition of word relationship. The x and y axes are tJio number of texts 
and the relationship of all words with a word in an agent’s word relation matrix, 
respectively, (b) Dynatnics of cluster structure caused by the rapid change shown in 
(a). This is a scattered diagram from principal component analysis of matrices. 



4 Discussions 

The clustering can be regarded as categorization through conversations, since 
words in a cluster have stronger relation with each other and weaker relation 
with words outside tire cluster. Typical clusters are a combination of two types 
of clusters, flat and gradual, that is, a flat center with gradual expansion into the 
peripheral. The cluster structure shares some characteristics with the prototype 
category [6,8]. 

In the traditional notion of category, the membersliip of a category is thought 
to be defined rigidly like the set notion. In the prototype category theory, in 
contrast, the membership is matter of gradient and the boundary of a category 
is fuzzy. The category of liquid containers provides an example. Bottles and 
glasses arc the typical members of the category. Glasses are similar to bowls, 
bowls are to soup plates, and soup plates are to flat dishes. Although neighboring 
members are fairly similar, the last one may not be the member of the category, 
but the boundary which defines the membership of the category is unclear. 
Another important feature of prototype categories is stability and adaptability 
with which language.s should equip themselves to establish communication and 
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to be flexible about changes. Prototype category and our cluster structure are 
akin in these traits^. 

Agents develop both commonality and individuality. The structure common 
to agents implies the emergence of a social system, in which some words are used 
in the same way by most agents. The words acquire, in the speculative view, 
virtual references in the society®. For a developmental enquiry, we should study 
how word relationship which reflects relation among prepared entities changes 
or expands with communication. 

The present algorithm shows not a simple convergence but drastic turnovers, 
which are usually brought by new combinations of words. The turnover behav- 
ior locally restructures words in clusters. Such new combinations of words is like 
metaphorical expressions which often tie different semantic domains by using 
words from the separated domains in one sentence. And such metaphorical ex- 
pressions, if they are totally impressive, might modify our internal models, or 
world view, dramatically. Therefore, it is important for dynamics of linguistic 
categorization not only to develop clusters but to modify the clusters by a small 
impact. This is also important for maintaining the dynamics at the global level. 

The coefficient parameter controls nonlinearity of the present system. 
Although the results reported here are seen in the broad area of a'^, the system 
is likely to fall into fixed and uniform structure at the too large value of o'". If 
the creativity rate c is too large, the system has too strong randomness for us 
to find any significant structure in word relation matrices and their dynamics. 
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